Btw, I tried rdd.map { i => System.getProperty("java.class.path") }.collect()
but didn't see the jars added via "--jars" on the executor classpath. -Xiangrui On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote: > I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The > reflection approach mentioned by DB didn't work either. I checked the > distributed cache on a worker node and found the jar there. It is also > in the Environment tab of the WebUI. The workaround is making an > assembly jar. > > DB, could you create a JIRA and describe what you have found so far? Thanks! > > Best, > Xiangrui > > On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> wrote: >> Can you try moving your mapPartitions to another class/object which is >> referenced only after sc.addJar ? >> >> I would suspect CNFEx is coming while loading the class containing >> mapPartitions before addJars is executed. >> >> In general though, dynamic loading of classes means you use reflection to >> instantiate it since expectation is you don't know which implementation >> provides the interface ... If you statically know it apriori, you bundle it >> in your classpath. >> >> Regards >> Mridul >> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: >> >>> Finally find a way out of the ClassLoader maze! It took me some times to >>> understand how it works; I think it worths to document it in a separated >>> thread. >>> >>> We're trying to add external utility.jar which contains CSVRecordParser, >>> and we added the jar to executors through sc.addJar APIs. >>> >>> If the instance of CSVRecordParser is created without reflection, it >>> raises *ClassNotFound >>> Exception*. >>> >>> data.mapPartitions(lines => { >>> val csvParser = new CSVRecordParser((delimiter.charAt(0)) >>> lines.foreach(line => { >>> val lineElems = csvParser.parseLine(line) >>> }) >>> ... >>> ... >>> ) >>> >>> >>> If the instance of CSVRecordParser is created through reflection, it works. >>> >>> data.mapPartitions(lines => { >>> val loader = Thread.currentThread.getContextClassLoader >>> val CSVRecordParser = >>> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") >>> >>> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) >>> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) >>> >>> val parseLine = CSVRecordParser >>> .getDeclaredMethod("parseLine", classOf[String]) >>> >>> lines.foreach(line => { >>> val lineElems = parseLine.invoke(csvParser, >>> line).asInstanceOf[Array[String]] >>> }) >>> ... >>> ... >>> ) >>> >>> >>> This is identical to this question, >>> >>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection >>> >>> It's not intuitive for users to load external classes through reflection, >>> but couple available solutions including 1) messing around >>> systemClassLoader by calling systemClassLoader.addURI through reflection or >>> 2) forking another JVM to add jars into classpath before bootstrap loader >>> are very tricky. >>> >>> Any thought on fixing it properly? >>> >>> @Xiangrui, >>> netlib-java jniloader is loaded from netlib-java through reflection, so >>> this problem will not be seen. >>> >>> Sincerely, >>> >>> DB Tsai >>> ------------------------------------------------------- >>> My Blog: https://www.dbtsai.com >>> LinkedIn: https://www.linkedin.com/in/dbtsai >>>