I've been developing a HiveStorageHandler class (and associated classes) to
integrate a non-file-based table storage engine into Hive. I am currently
working with version 1.3 of the HortonWorks distro, but the issue that I've run
into appears to be present in the Apache.Org code base as well.
The specific issue that occurs is that when the MapReduce program is run, it
dies with the following exception:
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
at org.apache.hadoop.fs.Path.<init>(Path.java:90)
at
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:106)
at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Looking at the code for HiveInputFormat.getPath() I find the following:
public Path getPath() {
if (inputSplit instanceof FileSplit) {
return ((FileSplit) inputSplit).getPath();
}
return new Path("");
}
It would appear that this code means that if my InputFormat.getSplits() method
returns InputSplit objects that do not derive from FileSplit (which is the case
for my InputFormat class as my storage engine is not file-based), the
'getPath()' method will try to return 'new Path("")'.
The problem is that the code for the Path class specifically disallows
constructing an instance of Path with an empty string. Here is the code for
Path.checkPathArg():
private void checkPathArg( String path ) {
// disallow construction of a Path from an empty string
if ( path == null) {
throw new IllegalArgumentException( "Can not create a Path from a null
string");
}
if ( path.length() == 0 ) {
throw new IllegalArgumentException( "Can not create a Path from an empty
string");
}
}
So if HiveInputFormat.getPath() is ever called when 'inputSplit' is not an
instance of 'FileSplit' it invokes the construction of a Path object that will
fail with an exception.
So my question is: If this is a bug in Hive, can we get it fixed? If it is not
a bug in Hive but rather a misunderstanding on my part, could someone give me
some pointers on how to use InputSplit objects that do not derive from
FileSplit in such a way as to avoid tripping this issue?
Thank you for your time.
Eric Karlson