I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The reflection approach mentioned by DB didn't work either. I checked the distributed cache on a worker node and found the jar there. It is also in the Environment tab of the WebUI. The workaround is making an assembly jar.
DB, could you create a JIRA and describe what you have found so far? Thanks! Best, Xiangrui On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> wrote: > Can you try moving your mapPartitions to another class/object which is > referenced only after sc.addJar ? > > I would suspect CNFEx is coming while loading the class containing > mapPartitions before addJars is executed. > > In general though, dynamic loading of classes means you use reflection to > instantiate it since expectation is you don't know which implementation > provides the interface ... If you statically know it apriori, you bundle it > in your classpath. > > Regards > Mridul > On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: > >> Finally find a way out of the ClassLoader maze! It took me some times to >> understand how it works; I think it worths to document it in a separated >> thread. >> >> We're trying to add external utility.jar which contains CSVRecordParser, >> and we added the jar to executors through sc.addJar APIs. >> >> If the instance of CSVRecordParser is created without reflection, it >> raises *ClassNotFound >> Exception*. >> >> data.mapPartitions(lines => { >> val csvParser = new CSVRecordParser((delimiter.charAt(0)) >> lines.foreach(line => { >> val lineElems = csvParser.parseLine(line) >> }) >> ... >> ... >> ) >> >> >> If the instance of CSVRecordParser is created through reflection, it works. >> >> data.mapPartitions(lines => { >> val loader = Thread.currentThread.getContextClassLoader >> val CSVRecordParser = >> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") >> >> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) >> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) >> >> val parseLine = CSVRecordParser >> .getDeclaredMethod("parseLine", classOf[String]) >> >> lines.foreach(line => { >> val lineElems = parseLine.invoke(csvParser, >> line).asInstanceOf[Array[String]] >> }) >> ... >> ... >> ) >> >> >> This is identical to this question, >> >> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection >> >> It's not intuitive for users to load external classes through reflection, >> but couple available solutions including 1) messing around >> systemClassLoader by calling systemClassLoader.addURI through reflection or >> 2) forking another JVM to add jars into classpath before bootstrap loader >> are very tricky. >> >> Any thought on fixing it properly? >> >> @Xiangrui, >> netlib-java jniloader is loaded from netlib-java through reflection, so >> this problem will not be seen. >> >> Sincerely, >> >> DB Tsai >> ------------------------------------------------------- >> My Blog: https://www.dbtsai.com >> LinkedIn: https://www.linkedin.com/in/dbtsai >>