Hello Hive User Mailing List,
I'm trying to debug a custom InputFormat that I'm using in Hive. I'm
using version 0.12.0 of Hive and Hadoop 2.4.1.
I'm having trouble attaching a debugger to my InputFormat class inside
the Hive server. My session looks like this:
$ ./hive-0.12.0/bin/hive --debug
Listening for transport dt_socket at address: 8000
(I attach a debugger from intellij at this point, all seems to be going
well).
15/02/11 23:28:01 INFO Configuration.deprecation:
mapred.input.dir.recursive is deprecated. Instead, use
mapreduce.input.fileinputformat.input.dir.recursive
15/02/11 23:28:01 INFO Configuration.deprecation: mapred.max.split.size
is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
15/02/11 23:28:01 INFO Configuration.deprecation: mapred.min.split.size
is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
15/02/11 23:28:01 INFO Configuration.deprecation:
mapred.min.split.size.per.rack is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.rack
15/02/11 23:28:01 INFO Configuration.deprecation:
mapred.min.split.size.per.node is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize.per.node
15/02/11 23:28:01 INFO Configuration.deprecation: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
15/02/11 23:28:01 INFO Configuration.deprecation:
mapred.reduce.tasks.speculative.execution is deprecated. Instead, use
mapreduce.reduce.speculative
Logging initialized using configuration in ...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [......]
SLF4J: Found binding in [......]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-02-11 23:28:02.016 java[2237:86664] Unable to load realm info from
SCDynamicStore
Now I'm trying to exercise my custom InputFormat class:
hive> select * from messages;
The debugger attaches, I can step through, everything is still going
great. The trouble happens when I try anything other than "SELECT * from
TABLE," launching a MapReduce job. For example:
hive> select field from messages;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Error occurred during initialization of VMERROR: Cannot load this JVM TI
agent twice, check your java command line for duplicate jdwp options.
agent library failed to init: jdwp
Execution failed with exit status: 1
Obtaining error information
Task failed!
Task ID:
Stage-1
Logs:
/tmp/luke/hive.log
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
I haven't explicitly set any environment variables like HADOOP_OPTS or
HIVE_OPTS. I'm relying on the --debug flag to do this for me when I
launch Hive. However, I do notice the following if I run "set" from the
Hive shell:
env:HADOOP_OPTS= -Djava.net.preferIPv4Stack=true
-Dhadoop.log.dir=/.../hadoop-2.4.1/logs -Dhadoop.log.file=hadoop.log
-Dhadoop.home.dir=/.../hadoop-2.4.1 -Dhadoop.id.str=luke
-Dhadoop.root.logger=INFO,console
-Djava.library.path=/.../hadoop-2.4.1/lib/native
-Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true
-Xmx512m -XX:+UseParallelGC
-agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y
-Dhadoop.security.logger=INFO,NullAppender
env:HIVE_MAIN_CLIENT_DEBUG_OPTS= -XX:+UseParallelGC
-agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y
env:HIVE_CHILD_CLIENT_DEBUG_OPTS= -XX:+UseParallelGC
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n
env:HADOOP_CLIENT_OPTS=-Xmx512m -XX:+UseParallelGC
-agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y
Also, per https://issues.apache.org/jira/browse/HIVE-3936, I've
commented out L217 of bin/hive which looks like this:
# Starting at line 210:
if [ "$DEBUG" ]; then
if [ "$HELP" ]; then
debug_help
exit 0
else
get_debug_params "$DEBUG"
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS
$HIVE_MAIN_CLIENT_DEBUG_OPTS"
# export HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
fi
fi
As I said, this fix works fine when I no MR tasks need to be launched,
but then I keep getting the same error about jdwp when I try anything
non-trivial.
Any help is appreciated. Thank you for your time.
Luke