Re: Hive tables query failing for simple query with memory error.

Thai Bui Fri, 19 Oct 2018 09:06:26 -0700

Your Tez container size is too small relatively to your query and data
size. Notice the log said *1.0 GB of 1 GB physical memory used. *It's
because the default Tez container/task size for your cluster is 1024GB. You
can increase it to a higher number (such as 2048 or 4096) via the setting
hive.tez.container.size when you launch your cluster.


Similarly, make sure that your YARN node manager setting is high enough
(via yarn.nodemanager.resource.memory-mb) so that you can launch a
container larger than 1GB in size.

This article may help you more to understand what/where to tune and how.
It's should be applicable for EMR cluster
https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html



On Thu, Oct 18, 2018 at 1:13 PM AgriNut solutions <agrinutsol2...@gmail.com>
wrote:

> Hi Hive experts,
>
> I am having a 1 Master node, 3 corenodes and autoscaled task nodes from
> min 1 to max 20 nodes EMR cluster.
>
> Hive table's data is 3.5Gb with 1.3e6 rows and 28 columns. And we can't
> run any query with it, as it fails due to memory error:
>
> Intially got below error:
> ```
> Application application_1538433214426_0296 failed 2 times due to AM
> Container for appattempt_1538433214426_0296_000002 exited with  exitCode:
> -104
> *Failing this attempt.Diagnostics: Container
> [pid=20906,containerID=container_1538433214426_0296_02_000001] is running
> beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical
> memory used; 2.8 GB of 5 GB virtual memory used. Killing container.*
> Dump of the process-tree for container_1538433214426_0296_02_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 20906 20904 20906 20906 (bash) 0 0 115863552 670 /bin/bash -c
> /usr/lib/jvm/java-openjdk/bin/java  -Xmx819m
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp
> -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
> -XX:+UseParallelGC
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
> -Dlog4j.configuration=tez-container-log4j.properties
> -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001
> -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=''
> org.apache.tez.dag.app.DAGAppMaster --session
> 1>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stdout
> 2>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stderr
>         |- 20921 20906 20906 20906 (java) 4140 141 2911690752 263307
> /usr/lib/jvm/java-openjdk/bin/java -Xmx819m
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp
> -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
> -XX:+UseParallelGC
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
> -Dlog4j.configuration=tez-container-log4j.properties
> -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001
> -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=
> org.apache.tez.dag.app.DAGAppMaster --session
> *Container killed on request. Exit code is 143*
> *Container exited with a non-zero exit code 143*
> For more detailed output, check the application tracking page:
> http://ip-172-24-11-108.us-east-2.compute.internal:8088/cluster/app/application_1538433214426_0296
> Then click on links to logs of each attempt.
> . Failing the application.
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Application
> application_1538433214426_0296 failed 2 times due to AM Container for
> appattempt_1538433214426_0296_000002 exited with  exitCode: -104
> Failing this attempt.Diagnostics: Container
> [pid=20906,containerID=container_1538433214426_0296_02_000001] is running
> beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical
> memory used; 2.8 GB of 5 GB virtual memory used. Killing container.
> Dump of the process-tree for container_1538433214426_0296_02_000001 :
>         |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
>         |- 20906 20904 20906 20906 (bash) 0 0 115863552 670 /bin/bash -c
> /usr/lib/jvm/java-openjdk/bin/java  -Xmx819m
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp
> -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
> -XX:+UseParallelGC
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
> -Dlog4j.configuration=tez-container-log4j.properties
> -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001
> -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=''
> org.apache.tez.dag.app.DAGAppMaster --session
> 1>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stdout
> 2>/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001/stderr
>         |- 20921 20906 20906 20906 (java) 4140 141 2911690752 263307
> /usr/lib/jvm/java-openjdk/bin/java -Xmx819m
> -Djava.io.tmpdir=/mnt/yarn/usercache/hadoop/appcache/application_1538433214426_0296/container_1538433214426_0296_02_000001/tmp
> -server -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
> -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA
> -XX:+UseParallelGC
> -Dlog4j.configuratorClass=org.apache.tez.common.TezLog4jConfigurator
> -Dlog4j.configuration=tez-container-log4j.properties
> -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1538433214426_0296/container_1538433214426_0296_02_000001
> -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel=
> org.apache.tez.dag.app.DAGAppMaster --session
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143
> For more detailed output, check the application tracking page:
> http://ip-172-24-11-108.us-east-2.compute.internal:8088/cluster/app/application_1538433214426_0296
> Then click on links to logs of each attempt.
> . Failing the application.
> ```
> Can anyone help on what might be the issue and any suggestions would help.
> Thanks in advance.
> Also, no matter how many nodes/mappers and reducers I had, the query
> execution is only one container. Any help on this too. Thanks.
>


-- 
Thai

Re: Hive tables query failing for simple query with memory error.

Reply via email to