Your Tez container size is too small relatively to your query and data size. Notice the log said *1.0 GB of 1 GB physical memory used. *It's because the default Tez container/task size for your cluster is 1024GB. You can increase it to a higher number (such as 2048 or 4096) via the setting hive.tez.container.size when you launch your cluster.
Similarly, make sure that your YARN node manager setting is high enough (via yarn.nodemanager.resource.memory-mb) so that you can launch a container larger than 1GB in size. This article may help you more to understand what/where to tune and how. It's should be applicable for EMR cluster https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html On Thu, Oct 18, 2018 at 1:13 PM AgriNut solutions <agrinutsol2...@gmail.com> wrote: > Hi Hive experts, > > I am having a 1 Master node, 3 corenodes and autoscaled task nodes from > min 1 to max 20 nodes EMR cluster. > > Hive table's data is 3.5Gb with 1.3e6 rows and 28 columns. And we can't > run any query with it, as it fails due to memory error: > > Intially got below error: > ``` > Application application_1538433214426_0296 failed 2 times due to AM > Container for appattempt_1538433214426_0296_000002 exited with exitCode: > -104 > *Failing this attempt.Diagnostics: Container > [pid=20906,containerID=container_1538433214426_0296_02_000001] is running > beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical > memory used; 2.8 GB of 5 GB virtual memory used. Killing container.* > Dump of the process-tree for container_1538433214426_0296_02_000001 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 20906 20904 20906 20906 (bash) 0 0 115863552 670 /bin/bash -c > /usr/lib/jvm/java-openjdk/bin/java -Xmx819m > -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp > -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001 > -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' > org.apache.tez.dag.app.DAGAppMaster --session > 1>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stdout > 2>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stderr > |- 20921 20906 20906 20906 (java) 4140 141 2911690752 263307 > /usr/lib/jvm/java-openjdk/bin/java -Xmx819m > -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp > -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001 > -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= > org.apache.tez.dag.app.DAGAppMaster --session > *Container killed on request. Exit code is 143* > *Container exited with a non-zero exit code 143* > For more detailed output, check the application tracking page: > http://ip-172-24-11-108.us-east-2.compute.internal:8088/cluster/app/application_1538433214426_0296 > Then click on links to logs of each attempt. > . Failing the application. > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.tez.TezTask. Application > application_1538433214426_0296 failed 2 times due to AM Container for > appattempt_1538433214426_0296_000002 exited with exitCode: -104 > Failing this attempt.Diagnostics: Container > [pid=20906,containerID=container_1538433214426_0296_02_000001] is running > beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical > memory used; 2.8 GB of 5 GB virtual memory used. Killing container. > Dump of the process-tree for container_1538433214426_0296_02_000001 : > |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) > SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE > |- 20906 20904 20906 20906 (bash) 0 0 115863552 670 /bin/bash -c > /usr/lib/jvm/java-openjdk/bin/java -Xmx819m > -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp > -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001 > -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' > org.apache.tez.dag.app.DAGAppMaster --session > 1>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stdout > 2>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stderr > |- 20921 20906 20906 20906 (java) 4140 141 2911690752 263307 > /usr/lib/jvm/java-openjdk/bin/java -Xmx819m > -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp > -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN > -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA > -XX:+UseParallelGC > -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator > -Dlog4j.configuration=tez-container-log4j.properties > -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001 > -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel= > org.apache.tez.dag.app.DAGAppMaster --session > Container killed on request. Exit code is 143 > Container exited with a non-zero exit code 143 > For more detailed output, check the application tracking page: > http://ip-172-24-11-108.us-east-2.compute.internal:8088/cluster/app/application_1538433214426_0296 > Then click on links to logs of each attempt. > . Failing the application. > ``` > Can anyone help on what might be the issue and any suggestions would help. > Thanks in advance. > Also, no matter how many nodes/mappers and reducers I had, the query > execution is only one container. Any help on this too. Thanks. > -- Thai