Ok, I'll answer my own question.
This is caused by the fact that hadoop uses
system.getProperty("path.separator") as the delimiter in the list of
jar files passed via -libjars.
If your job spans platforms, system.getProperty("path.separator")
returns a different delimiter on the different platforms.
My solution is to use a comma as the delimiter, rather than the path.separator.
I realize comma is, perhaps, a poor choice for a delimiter because it
is valid in filenames on both Windows and Linux, but the -libjars uses
it as the delimiter when listing the additional required jars. So, I
figured if it's already being used as a delimiter, then it's
reasonable to use it internally as well.
I've attached a patch (against 0.19.0) that applies this change.
Now, with this change, I can submit hadoop jobs (requiring multiple
supporting jars) from my Windows laptop (via cygwin) to my 10-node
Linux hadoop cluster.
Any chance this change could be applied to the hadoop codebase?
diff -ur src/core/org/apache/hadoop/filecache/DistributedCache.java
src_working/core/org/apache/hadoop/filecache/DistributedCache.java
--- src/core/org/apache/hadoop/filecache/DistributedCache.java 2008-11-13
21:09:36.000000000 -0600
+++ src_working/core/org/apache/hadoop/filecache/DistributedCache.java
2008-12-12 14:07:48.865460800 -0600
@@ -710,7 +710,7 @@
throws IOException {
String classpath = conf.get("mapred.job.classpath.archives");
conf.set("mapred.job.classpath.archives", classpath == null ? archive
- .toString() : classpath + System.getProperty("path.separator")
+ .toString() : classpath + ","
+ archive.toString());
FileSystem fs = FileSystem.get(conf);
URI uri = fs.makeQualified(archive).toUri();
@@ -727,8 +727,7 @@
String classpath = conf.get("mapred.job.classpath.archives");
if (classpath == null)
return null;
- ArrayList list = Collections.list(new StringTokenizer(classpath, System
-
.getProperty("path.separator")));
+ ArrayList list = Collections.list(new StringTokenizer(classpath, ","));
Path[] paths = new Path[list.size()];
for (int i = 0; i < list.size(); i++) {
paths[i] = new Path((String) list.get(i));