How can I get more information regarding this exception? On Wed, Jun 17, 2015 at 1:17 AM, Saiph Kappa <saiph.ka...@gmail.com> wrote:
> Hi, > > I am running a simple spark streaming application on hadoop 2.7.0/YARN > (master: yarn-client) with 2 executors in different machines. However, > while the app is running, I can see on the app web UI (tab executors) that > only 1 executor keeps completing tasks over time, the other executor only > works and completes tasks for some seconds. From the logs I can see an > exception arising, though it is not clear what went wrong. > > Here is the yarn-nodemanager log: > « > 2015-06-17 00:29:50,967 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Starting resource-monitoring for container_1434391147618_0007_01_000003 > 2015-06-17 00:29:50,977 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 286.5 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:29:53,991 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 463.7 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:29:57,009 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 465.7 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:00,024 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 467.6 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:03,032 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 474.0 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:06,041 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 480.2 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:09,053 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 540.9 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:12,068 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 550.9 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:15,075 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 551.1 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:18,090 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 30553 for container-id > container_1434391147618_0007_01_000003: 558.7 MB of 3 GB physical memory > used; 2.7 GB of 6.3 GB virtual memory used > 2015-06-17 00:30:20,157 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exit > code from container container_1434391147618_0007_01_000003 is : 1 > 2015-06-17 00:30:20,157 WARN > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Exception from container-launch with container ID: > container_1434391147618_0007_01_000003 and exit code: 1 > ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > at org.apache.hadoop.util.Shell.run(Shell.java:456) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from > container-launch. > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: > container_1434391147618_0007_01_000003 > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exit code: 1 > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Stack trace: > ExitCodeException exitCode=1: > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell.run(Shell.java:456) > 2015-06-17 00:30:20,157 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.FutureTask.run(FutureTask.java:262) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: at > java.lang.Thread.run(Thread.java:745) > 2015-06-17 00:30:20,158 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Container exited with a non-zero exit code 1 > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1434391147618_0007_01_000003 transitioned from RUNNING > to EXITED_WITH_FAILURE > 2015-06-17 00:30:20,158 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: > Cleaning up container container_1434391147618_0007_01_000003 > 2015-06-17 00:30:20,178 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Deleting absolute path : > /tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007/container_1434391147618_0007_01_000003 > 2015-06-17 00:30:20,178 WARN > org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=myuser > OPERATION=Container Finished - Failed TARGET=ContainerImpl > RESULT=FAILURE DESCRIPTION=Container failed with state: > EXITED_WITH_FAILURE APPID=application_1434391147618_0007 > CONTAINERID=container_1434391147618_0007_01_000003 > 2015-06-17 00:30:20,178 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl: > Container container_1434391147618_0007_01_000003 transitioned from > EXITED_WITH_FAILURE to DONE > 2015-06-17 00:30:20,179 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Removing container_1434391147618_0007_01_000003 from application > application_1434391147618_0007 > 2015-06-17 00:30:20,179 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got > event CONTAINER_STOP for appId application_1434391147618_0007 > 2015-06-17 00:30:20,500 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1434391147618_0007 transitioned from RUNNING to > APPLICATION_RESOURCES_CLEANINGUP > 2015-06-17 00:30:20,501 INFO > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: > Deleting absolute path : > /tmp/hadoop-myuser/nm-local-dir/usercache/myuser/appcache/application_1434391147618_0007 > 2015-06-17 00:30:20,501 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got > event APPLICATION_STOP for appId application_1434391147618_0007 > 2015-06-17 00:30:20,501 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl: > Application application_1434391147618_0007 transitioned from > APPLICATION_RESOURCES_CLEANINGUP to FINISHED > 2015-06-17 00:30:20,501 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler: > Scheduling Log Deletion for application: application_1434391147618_0007, > with delay of 10800 seconds > » > > Not sure if it is relevant, but in the output of the application I keep > getting this message: > «15/06/17 00:29:53 INFO ShuffledDStream: Time 1434497393000 ms is invalid > as zeroTime is 1434497391000 ms and slideDuration is 4000 ms and difference > is 2000 ms» > > I'm using spark 1.3.2. > > Any ideas of what can be happening? > > Thanks. > >