I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870

DB, could you add more info to that JIRA? Thanks!

-Xiangrui

On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <men...@gmail.com> wrote:
> Btw, I tried
>
> rdd.map { i =>
>   System.getProperty("java.class.path")
> }.collect()
>
> but didn't see the jars added via "--jars" on the executor classpath.
>
> -Xiangrui
>
> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <men...@gmail.com> wrote:
>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
>> reflection approach mentioned by DB didn't work either. I checked the
>> distributed cache on a worker node and found the jar there. It is also
>> in the Environment tab of the WebUI. The workaround is making an
>> assembly jar.
>>
>> DB, could you create a JIRA and describe what you have found so far? Thanks!
>>
>> Best,
>> Xiangrui
>>
>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <mri...@gmail.com> 
>> wrote:
>>> Can you try moving your mapPartitions to another class/object which is
>>> referenced only after sc.addJar ?
>>>
>>> I would suspect CNFEx is coming while loading the class containing
>>> mapPartitions before addJars is executed.
>>>
>>> In general though, dynamic loading of classes means you use reflection to
>>> instantiate it since expectation is you don't know which implementation
>>> provides the interface ... If you statically know it apriori, you bundle it
>>> in your classpath.
>>>
>>> Regards
>>> Mridul
>>> On 17-May-2014 7:28 am, "DB Tsai" <dbt...@stanford.edu> wrote:
>>>
>>>> Finally find a way out of the ClassLoader maze! It took me some times to
>>>> understand how it works; I think it worths to document it in a separated
>>>> thread.
>>>>
>>>> We're trying to add external utility.jar which contains CSVRecordParser,
>>>> and we added the jar to executors through sc.addJar APIs.
>>>>
>>>> If the instance of CSVRecordParser is created without reflection, it
>>>> raises *ClassNotFound
>>>> Exception*.
>>>>
>>>> data.mapPartitions(lines => {
>>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
>>>>     lines.foreach(line => {
>>>>       val lineElems = csvParser.parseLine(line)
>>>>     })
>>>>     ...
>>>>     ...
>>>>  )
>>>>
>>>>
>>>> If the instance of CSVRecordParser is created through reflection, it works.
>>>>
>>>> data.mapPartitions(lines => {
>>>>     val loader = Thread.currentThread.getContextClassLoader
>>>>     val CSVRecordParser =
>>>>         loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
>>>>
>>>>     val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
>>>>         .newInstance(delimiter.charAt(0).asInstanceOf[Character])
>>>>
>>>>     val parseLine = CSVRecordParser
>>>>         .getDeclaredMethod("parseLine", classOf[String])
>>>>
>>>>     lines.foreach(line => {
>>>>        val lineElems = parseLine.invoke(csvParser,
>>>> line).asInstanceOf[Array[String]]
>>>>     })
>>>>     ...
>>>>     ...
>>>>  )
>>>>
>>>>
>>>> This is identical to this question,
>>>>
>>>> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
>>>>
>>>> It's not intuitive for users to load external classes through reflection,
>>>> but couple available solutions including 1) messing around
>>>> systemClassLoader by calling systemClassLoader.addURI through reflection or
>>>> 2) forking another JVM to add jars into classpath before bootstrap loader
>>>> are very tricky.
>>>>
>>>> Any thought on fixing it properly?
>>>>
>>>> @Xiangrui,
>>>> netlib-java jniloader is loaded from netlib-java through reflection, so
>>>> this problem will not be seen.
>>>>
>>>> Sincerely,
>>>>
>>>> DB Tsai
>>>> -------------------------------------------------------
>>>> My Blog: https://www.dbtsai.com
>>>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>>>

Reply via email to