Hello Hive User Mailing List,

I'm trying to debug a custom InputFormat that I'm using in Hive. I'm using version 0.12.0 of Hive and Hadoop 2.4.1.

I'm having trouble attaching a debugger to my InputFormat class inside the Hive server. My session looks like this:

$ ./hive-0.12.0/bin/hive --debug
Listening for transport dt_socket at address: 8000

(I attach a debugger from intellij at this point, all seems to be going well).

15/02/11 23:28:01 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 15/02/11 23:28:01 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

Logging initialized using configuration in ...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [......]
SLF4J: Found binding in [......]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-02-11 23:28:02.016 java[2237:86664] Unable to load realm info from SCDynamicStore

Now I'm trying to exercise my custom InputFormat class:

hive> select * from messages;

The debugger attaches, I can step through, everything is still going great. The trouble happens when I try anything other than "SELECT * from TABLE," launching a MapReduce job. For example:

hive> select field from messages;

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Error occurred during initialization of VMERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options.

agent library failed to init: jdwp
Execution failed with exit status: 1
Obtaining error information

Task failed!
Task ID:
  Stage-1

Logs:

/tmp/luke/hive.log
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

I haven't explicitly set any environment variables like HADOOP_OPTS or HIVE_OPTS. I'm relying on the --debug flag to do this for me when I launch Hive. However, I do notice the following if I run "set" from the Hive shell:

env:HADOOP_OPTS= -Djava.net.preferIPv4Stack=true -Dhadoop.log.dir=/.../hadoop-2.4.1/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/.../hadoop-2.4.1 -Dhadoop.id.str=luke -Dhadoop.root.logger=INFO,console -Djava.library.path=/.../hadoop-2.4.1/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xmx512m -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y -Dhadoop.security.logger=INFO,NullAppender env:HIVE_MAIN_CLIENT_DEBUG_OPTS= -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y env:HIVE_CHILD_CLIENT_DEBUG_OPTS= -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,suspend=n env:HADOOP_CLIENT_OPTS=-Xmx512m -XX:+UseParallelGC -agentlib:jdwp=transport=dt_socket,server=y,address=8000,suspend=y

Also, per https://issues.apache.org/jira/browse/HIVE-3936, I've commented out L217 of bin/hive which looks like this:

# Starting at line 210:
if [ "$DEBUG" ]; then
  if [ "$HELP" ]; then
    debug_help
    exit 0
  else
    get_debug_params "$DEBUG"
export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS $HIVE_MAIN_CLIENT_DEBUG_OPTS"
#    export HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"
  fi
fi

As I said, this fix works fine when I no MR tasks need to be launched, but then I keep getting the same error about jdwp when I try anything non-trivial.

Any help is appreciated. Thank you for your time.

Luke

Reply via email to