I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870
DB, could you add more info to that JIRA? Thanks! -Xiangrui On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> wrote: > Btw, I tried > > rdd.map { i => > System.getProperty("java.class.path") > }.collect() > > but didn't see the jars added via "--jars" on the executor classpath. > > -Xiangrui > > On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote: >> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The >> reflection approach mentioned by DB didn't work either. I checked the >> distributed cache on a worker node and found the jar there. It is also >> in the Environment tab of the WebUI. The workaround is making an >> assembly jar. >> >> DB, could you create a JIRA and describe what you have found so far? Thanks! >> >> Best, >> Xiangrui >> >> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> >> wrote: >>> Can you try moving your mapPartitions to another class/object which is >>> referenced only after sc.addJar ? >>> >>> I would suspect CNFEx is coming while loading the class containing >>> mapPartitions before addJars is executed. >>> >>> In general though, dynamic loading of classes means you use reflection to >>> instantiate it since expectation is you don't know which implementation >>> provides the interface ... If you statically know it apriori, you bundle it >>> in your classpath. >>> >>> Regards >>> Mridul >>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: >>> >>>> Finally find a way out of the ClassLoader maze! It took me some times to >>>> understand how it works; I think it worths to document it in a separated >>>> thread. >>>> >>>> We're trying to add external utility.jar which contains CSVRecordParser, >>>> and we added the jar to executors through sc.addJar APIs. >>>> >>>> If the instance of CSVRecordParser is created without reflection, it >>>> raises *ClassNotFound >>>> Exception*. >>>> >>>> data.mapPartitions(lines => { >>>> val csvParser = new CSVRecordParser((delimiter.charAt(0)) >>>> lines.foreach(line => { >>>> val lineElems = csvParser.parseLine(line) >>>> }) >>>> ... >>>> ... >>>> ) >>>> >>>> >>>> If the instance of CSVRecordParser is created through reflection, it works. >>>> >>>> data.mapPartitions(lines => { >>>> val loader = Thread.currentThread.getContextClassLoader >>>> val CSVRecordParser = >>>> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") >>>> >>>> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) >>>> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) >>>> >>>> val parseLine = CSVRecordParser >>>> .getDeclaredMethod("parseLine", classOf[String]) >>>> >>>> lines.foreach(line => { >>>> val lineElems = parseLine.invoke(csvParser, >>>> line).asInstanceOf[Array[String]] >>>> }) >>>> ... >>>> ... >>>> ) >>>> >>>> >>>> This is identical to this question, >>>> >>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection >>>> >>>> It's not intuitive for users to load external classes through reflection, >>>> but couple available solutions including 1) messing around >>>> systemClassLoader by calling systemClassLoader.addURI through reflection or >>>> 2) forking another JVM to add jars into classpath before bootstrap loader >>>> are very tricky. >>>> >>>> Any thought on fixing it properly? >>>> >>>> @Xiangrui, >>>> netlib-java jniloader is loaded from netlib-java through reflection, so >>>> this problem will not be seen. >>>> >>>> Sincerely, >>>> >>>> DB Tsai >>>> ------------------------------------------------------- >>>> My Blog: https://www.dbtsai.com >>>> LinkedIn: https://www.linkedin.com/in/dbtsai >>>>