I've been developing a HiveStorageHandler class (and associated classes) to 
integrate a non-file-based table storage engine into Hive.  I am currently 
working with version 1.3 of the HortonWorks distro, but the issue that I've run 
into appears to be present in the Apache.Org code base as well.

The specific issue that occurs is that when the MapReduce program is run, it 
dies with the following exception:

java.lang.IllegalArgumentException: Can not create a Path from an empty string
        at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82)
        at org.apache.hadoop.fs.Path.<init>(Path.java:90)
        at 
org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:106)
        at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:450)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

Looking at the code for HiveInputFormat.getPath() I find the following:

public Path getPath() {
  if (inputSplit instanceof FileSplit) {
     return ((FileSplit) inputSplit).getPath();
  }
  return new Path("");
}

It would appear that this code means that if my InputFormat.getSplits() method 
returns InputSplit objects that do not derive from FileSplit (which is the case 
for my InputFormat class as my storage engine is not file-based), the 
'getPath()' method will try to return 'new Path("")'.

The problem is that the code for the Path class specifically disallows 
constructing an instance of Path with an empty string.  Here is the code for 
Path.checkPathArg():

private void checkPathArg( String path ) {
  // disallow construction of a Path from an empty string
  if ( path == null) {
    throw new IllegalArgumentException( "Can not create a Path from a null 
string");
  }
  if ( path.length() == 0 ) {
  throw new IllegalArgumentException( "Can not create a Path from an empty 
string");
  }
}

So if HiveInputFormat.getPath() is ever called when 'inputSplit' is not an 
instance of 'FileSplit' it invokes the construction of a Path object that will 
fail with an exception.

So my question is: If this is a bug in Hive, can we get it fixed?  If it is not 
a bug in Hive but rather a misunderstanding on my part, could someone give me 
some pointers on how to use InputSplit objects that do not derive from 
FileSplit in such a way as to avoid tripping this issue?

Thank you for your time.

Eric Karlson

Reply via email to