Hi Patrick, If spark-submit works correctly, user only needs to specify runtime jars via `--jars` instead of using `sc.addJar`. Is it correct? I checked SparkSubmit and yarn.Client but didn't find any code to handle `args.jars` for YARN mode. So I don't know where in the code the jars in the distributed cache are added to runtime classpath on executors.
Best, Xiangrui On Sun, May 18, 2014 at 11:58 AM, Patrick Wendell <pwend...@gmail.com> wrote: > @db - it's possible that you aren't including the jar in the classpath > of your driver program (I think this is what mridul was suggesting). > It would be helpful to see the stack trace of the CNFE. > > - Patrick > > On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell <pwend...@gmail.com> wrote: >> @xiangrui - we don't expect these to be present on the system >> classpath, because they get dynamically added by Spark (e.g. your >> application can call sc.addJar well after the JVM's have started). >> >> @db - I'm pretty surprised to see that behavior. It's definitely not >> intended that users need reflection to instantiate their classes - >> something odd is going on in your case. If you could create an >> isolated example and post it to the JIRA, that would be great. >> >> On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <men...@gmail.com> wrote: >>> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870 >>> >>> DB, could you add more info to that JIRA? Thanks! >>> >>> -Xiangrui >>> >>> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> wrote: >>>> Btw, I tried >>>> >>>> rdd.map { i => >>>> System.getProperty("java.class.path") >>>> }.collect() >>>> >>>> but didn't see the jars added via "--jars" on the executor classpath. >>>> >>>> -Xiangrui >>>> >>>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote: >>>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The >>>>> reflection approach mentioned by DB didn't work either. I checked the >>>>> distributed cache on a worker node and found the jar there. It is also >>>>> in the Environment tab of the WebUI. The workaround is making an >>>>> assembly jar. >>>>> >>>>> DB, could you create a JIRA and describe what you have found so far? >>>>> Thanks! >>>>> >>>>> Best, >>>>> Xiangrui >>>>> >>>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> >>>>> wrote: >>>>>> Can you try moving your mapPartitions to another class/object which is >>>>>> referenced only after sc.addJar ? >>>>>> >>>>>> I would suspect CNFEx is coming while loading the class containing >>>>>> mapPartitions before addJars is executed. >>>>>> >>>>>> In general though, dynamic loading of classes means you use reflection to >>>>>> instantiate it since expectation is you don't know which implementation >>>>>> provides the interface ... If you statically know it apriori, you bundle >>>>>> it >>>>>> in your classpath. >>>>>> >>>>>> Regards >>>>>> Mridul >>>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote: >>>>>> >>>>>>> Finally find a way out of the ClassLoader maze! It took me some times to >>>>>>> understand how it works; I think it worths to document it in a separated >>>>>>> thread. >>>>>>> >>>>>>> We're trying to add external utility.jar which contains CSVRecordParser, >>>>>>> and we added the jar to executors through sc.addJar APIs. >>>>>>> >>>>>>> If the instance of CSVRecordParser is created without reflection, it >>>>>>> raises *ClassNotFound >>>>>>> Exception*. >>>>>>> >>>>>>> data.mapPartitions(lines => { >>>>>>> val csvParser = new CSVRecordParser((delimiter.charAt(0)) >>>>>>> lines.foreach(line => { >>>>>>> val lineElems = csvParser.parseLine(line) >>>>>>> }) >>>>>>> ... >>>>>>> ... >>>>>>> ) >>>>>>> >>>>>>> >>>>>>> If the instance of CSVRecordParser is created through reflection, it >>>>>>> works. >>>>>>> >>>>>>> data.mapPartitions(lines => { >>>>>>> val loader = Thread.currentThread.getContextClassLoader >>>>>>> val CSVRecordParser = >>>>>>> loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser") >>>>>>> >>>>>>> val csvParser = CSVRecordParser.getConstructor(Character.TYPE) >>>>>>> .newInstance(delimiter.charAt(0).asInstanceOf[Character]) >>>>>>> >>>>>>> val parseLine = CSVRecordParser >>>>>>> .getDeclaredMethod("parseLine", classOf[String]) >>>>>>> >>>>>>> lines.foreach(line => { >>>>>>> val lineElems = parseLine.invoke(csvParser, >>>>>>> line).asInstanceOf[Array[String]] >>>>>>> }) >>>>>>> ... >>>>>>> ... >>>>>>> ) >>>>>>> >>>>>>> >>>>>>> This is identical to this question, >>>>>>> >>>>>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection >>>>>>> >>>>>>> It's not intuitive for users to load external classes through >>>>>>> reflection, >>>>>>> but couple available solutions including 1) messing around >>>>>>> systemClassLoader by calling systemClassLoader.addURI through >>>>>>> reflection or >>>>>>> 2) forking another JVM to add jars into classpath before bootstrap >>>>>>> loader >>>>>>> are very tricky. >>>>>>> >>>>>>> Any thought on fixing it properly? >>>>>>> >>>>>>> @Xiangrui, >>>>>>> netlib-java jniloader is loaded from netlib-java through reflection, so >>>>>>> this problem will not be seen. >>>>>>> >>>>>>> Sincerely, >>>>>>> >>>>>>> DB Tsai >>>>>>> ------------------------------------------------------- >>>>>>> My Blog: https://www.dbtsai.com >>>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai >>>>>>>