Chris McConnell created HIVE-3697:
-------------------------------------

             Summary: External JAR files on HDFS can lead to race condition 
with hive.downloaded.resources.dir
                 Key: HIVE-3697
                 URL: https://issues.apache.org/jira/browse/HIVE-3697
             Project: Hive
          Issue Type: Bug
            Reporter: Chris McConnell


I've seen situations where utilizing JAR files on HDFS can cause job failures 
via CNFE or JVM crashes. 

This is difficult to replicate, seems to be related to JAR size, latency 
between client and HDFS cluster, but I've got some example stack traces below. 
Seems that the calls made to FileSystem (copyToLocal) which are static and will 
be executed to delete the current local copy can cause the file(s) to be 
removed during job processing.

We should consider changing the default for hive.downloaded.resources.dir to 
include some level of uniqueness per job. We should not consider 
hive.session.id however, as execution of multiple statements via the same 
user/session which might access the same JAR files will utilize the same 
session.

A proposal might be to utilize System.nanoTime() -- which might be enough to 
avoid the issue, although it's not perfect (depends on JVM and system for level 
of precision) as part of the default 
(/tmp/${user.name}/resources/System.nanoTime()/). 

If anyone else has hit this, would like to capture environment information as 
well. Perhaps there is something else at play here. 

Here are some examples of the errors:

for i in {0..2}; do hive -S -f query.q& done
[2] 48405
[3] 48406
[4] 48407
% #
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007fb10bd931f0, pid=48407, tid=140398456698624
#
# JRE version: 6.0_31-b04
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.6-b01 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [libzip.so+0xb1f0]  __int128+0x60
#
# An error report file with more information is saved as:
# /home/.../hs_err_pid48407.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
java.lang.NoClassDefFoundError: com/example/udf/Lower
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at 
org.apache.hadoop.hive.ql.exec.FunctionTask.getUdfClass(FunctionTask.java:105)
        at 
org.apache.hadoop.hive.ql.exec.FunctionTask.createFunction(FunctionTask.java:75)
        at 
org.apache.hadoop.hive.ql.exec.FunctionTask.execute(FunctionTask.java:63)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
        at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
        at 
org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:439)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:449)
        at 
org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:485)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:607)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: com.example.udf.Lower
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        ... 24 more
FAILED: Execution Error, return code -101 from 
org.apache.hadoop.hive.ql.exec.FunctionTask

Another:
for i in {0..2}; do hive -S -f query.q& done
[1] 16294 
[2] 16295 
[3] 16296 
[]$ Couldn't create directory /tmp/ctm/resources/
Couldn't create directory /tmp/ctm/resources/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to