I agree. Using a List<String> seems to make more sense. FYI... I opened a jira for this: https://issues.apache.org/jira/browse/HADOOP-4864
On Tue, Dec 30, 2008 at 3:53 PM, Jason Venner <[email protected]> wrote: > The path separator is a major issue with a number of items in the > configuration data set that are multiple items packed together via the path > separator. > the class path > the distributed cache > the input path set > > all suffer from the path.separator issue for 2 reasons: > 1 being the difference across jvms as indicated in the previous email item > (I had missed this!) > 2 separator characters that happen to be embedded in the individual > elements are not escaped before the item is added to the existing set. > > For all of the pain we have with these packed items, it may be simpler to > serialize a List<String> for multi element items rather than packing them > with the path.separator system property item. > > > > Aaron Kimball wrote: > >> Hi Stuart, >> >> Good sleuthing out that problem :) The correct way to submit patches is to >> file a ticket on JIRA (https://issues.apache.org/jira/browse/HADOOP). >> Create >> an account, create a new issue describing the bug, and then attach the >> patch >> file. There'll be a discussion there and others can review your patch and >> include it in the codebase. >> >> Cheers, >> - Aaron >> >> On Fri, Dec 12, 2008 at 12:14 PM, Stuart White <[email protected] >> >wrote: >> >> >> >>> Ok, I'll answer my own question. >>> >>> This is caused by the fact that hadoop uses >>> system.getProperty("path.separator") as the delimiter in the list of >>> jar files passed via -libjars. >>> >>> If your job spans platforms, system.getProperty("path.separator") >>> returns a different delimiter on the different platforms. >>> >>> My solution is to use a comma as the delimiter, rather than the >>> path.separator. >>> >>> I realize comma is, perhaps, a poor choice for a delimiter because it >>> is valid in filenames on both Windows and Linux, but the -libjars uses >>> it as the delimiter when listing the additional required jars. So, I >>> figured if it's already being used as a delimiter, then it's >>> reasonable to use it internally as well. >>> >>> I've attached a patch (against 0.19.0) that applies this change. >>> >>> Now, with this change, I can submit hadoop jobs (requiring multiple >>> supporting jars) from my Windows laptop (via cygwin) to my 10-node >>> Linux hadoop cluster. >>> >>> Any chance this change could be applied to the hadoop codebase? >>> >>> >>> >> >> >> >
