I spoke with DB offline about this a little while ago and he confirmed that he was able to access the jar from the driver.
The issue appears to be a general Java issue: you can't directly instantiate a class from a dynamically loaded jar. I reproduced it locally outside of Spark with: --- URLClassLoader urlClassLoader = new URLClassLoader(new URL[] { new File("myotherjar.jar").toURI().toURL() }, null); Thread.currentThread().setContextClassLoader(urlClassLoader); MyClassFromMyOtherJar obj = new MyClassFromMyOtherJar(); --- I was able to load the class with reflection. On Sun, May 18, 2014 at 11:58 AM, Patrick Wendell <pwend...@gmail.com>wrote: > @db - it's possible that you aren't including the jar in the classpath > of your driver program (I think this is what mridul was suggesting). > It would be helpful to see the stack trace of the CNFE. > > - Patrick > > On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell <pwend...@gmail.com> > wrote: > > @xiangrui - we don't expect these to be present on the system > > classpath, because they get dynamically added by Spark (e.g. your > > application can call sc.addJar well after the JVM's have started). > > > > @db - I'm pretty surprised to see that behavior. It's definitely not > > intended that users need reflection to instantiate their classes - > > something odd is going on in your case. If you could create an > > isolated example and post it to the JIRA, that would be great. > > > > On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870 > >> > >> DB, could you add more info to that JIRA? Thanks! > >> > >> -Xiangrui > >> > >> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> > wrote: > >>> Btw, I tried > >>> > >>> rdd.map { i => > >>> System.getProperty("java.class.path") > >>> }.collect() > >>> > >>> but didn't see the jars added via "--jars" on the executor classpath. > >>> > >>> -Xiangrui > >>> > >>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> > wrote: > >>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The > >>>> reflection approach mentioned by DB didn't work either. I checked the > >>>> distributed cache on a worker node and found the jar there. It is also > >>>> in the Environment tab of the WebUI. The workaround is making an > >>>> assembly jar. > >>>> > >>>> DB, could you create a JIRA and describe what you have found so far? > Thanks! > >>>> > >>>> Best, > >>>> Xiangrui > >>>> > >>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan < > mri...@gmail.com> wrote: > >>>>> Can you try moving your mapPartitions to another class/object which > is > >>>>> referenced only after sc.addJar ? > >>>>> > >>>>> I would suspect CNFEx is coming while loading the class containing > >>>>> mapPartitions before addJars is executed. > >>>>> > >>>>> In general though, dynamic loading of classes means you use > reflection to > >>>>> instantiate it since expectation is you don't know which > implementation > >>>>> provides the interface ... If you statically know it apriori, you > bundle it > >>>>> in your classpath. > >>>>> > >>>>> Regards > >>>>> Mridul > >>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: > >>>>> > >>>>>> Finally find a way out of the ClassLoader maze! It took me some > times to > >>>>>> understand how it works; I think it worths to document it in a > separated > >>>>>> thread. > >>>>>> > >>>>>> We're trying to add external utility.jar which contains > CSVRecordParser, > >>>>>> and we added the jar to executors through sc.addJar APIs. > >>>>>> > >>>>>> If the instance of CSVRecordParser is created without reflection, it > >>>>>> raises *ClassNotFound > >>>>>> Exception*. > >>>>>> > >>>>>> data.mapPartitions(lines => { > >>>>>> val csvParser = new CSVRecordParser((delimiter.charAt(0)) > >>>>>> lines.foreach(line => { > >>>>>> val lineElems = csvParser.parseLine(line) > >>>>>> }) > >>>>>> ... > >>>>>> ... > >>>>>> ) > >>>>>> > >>>>>> > >>>>>> If the instance of CSVRecordParser is created through reflection, > it works. > >>>>>> > >>>>>> data.mapPartitions(lines => { > >>>>>> val loader = Thread.currentThread.getContextClassLoader > >>>>>> val CSVRecordParser = > >>>>>> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") > >>>>>> > >>>>>> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) > >>>>>> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) > >>>>>> > >>>>>> val parseLine = CSVRecordParser > >>>>>> .getDeclaredMethod("parseLine", classOf[String]) > >>>>>> > >>>>>> lines.foreach(line => { > >>>>>> val lineElems = parseLine.invoke(csvParser, > >>>>>> line).asInstanceOf[Array[String]] > >>>>>> }) > >>>>>> ... > >>>>>> ... > >>>>>> ) > >>>>>> > >>>>>> > >>>>>> This is identical to this question, > >>>>>> > >>>>>> > http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection > >>>>>> > >>>>>> It's not intuitive for users to load external classes through > reflection, > >>>>>> but couple available solutions including 1) messing around > >>>>>> systemClassLoader by calling systemClassLoader.addURI through > reflection or > >>>>>> 2) forking another JVM to add jars into classpath before bootstrap > loader > >>>>>> are very tricky. > >>>>>> > >>>>>> Any thought on fixing it properly? > >>>>>> > >>>>>> @Xiangrui, > >>>>>> netlib-java jniloader is loaded from netlib-java through > reflection, so > >>>>>> this problem will not be seen. > >>>>>> > >>>>>> Sincerely, > >>>>>> > >>>>>> DB Tsai > >>>>>> ------------------------------------------------------- > >>>>>> My Blog: https://www.dbtsai.com > >>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai > >>>>>> >