[ 
https://issues.apache.org/jira/browse/HIVE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978962#comment-14978962
 ] 

Jason Dere commented on HIVE-11878:
-----------------------------------

Apologies for not getting back to you on this before now. As to your comments:

{quote}
I have some doubts about using the thread context classloader as the parent. 
This does not seem to provide clean isolation between jars/resources between 
different sessions. Case in point: a thread context classloader could be a 
previous session's classloader .This can happen when the same thread was used 
to work on a previous session, and is now being used to work on the newer 
current session. The thread context classloaer could contain a different 
implementation of the same class also present in the session classloader. Do 
you see this a a problem?
{quote}
I think you are right, that the new session's classloader would be polluted 
with the loaded JARs from the thread's previous SessionState. What would be a 
better parent class loader to use here - the system class loader? Will it have 
any JARs added from the AUX_JARS options?
Checking some of the various AUX_JARS related options, not sure if I missed any 
others:
 - HIVE_AUX_JARS_PATH environment variable: This gets added to the CLASSPATH, 
so the System class loader will have these
 - hive.aux.jars: As far as I can tell this is used when shipping JARs to the 
Map/Reduce tasks, so this does not seem to get used by the SessionState. 
Someone correct me if I am wrong.
 - hive.reloadable.aux.jars.path: These jars are added at the start of a 
HiveServer2 session, so it may not matter that the System class loader does not 
have these jars.

{quote}
Another potential problem I'm thinking about – which is present in the proposed 
approach (see RB) is – in HiveServer2 any worker thread can serve any request 
by mapping it to a persistent session. Couldn't this lead to a situation where 
for a specific session the session specific classloader (conf.getClassLoader()) 
and the thread context classloader end up being different? Say we have two 
worker thread t1 and t2 .The very first query is handled by t1 where a fresh 
session s1 is created along with a fresh classloader c1, which is set as the 
session specific classloader and the thread context classloader. The next query 
for the same session is handled by t2. I guess since it is the same session s1, 
we do not create a fresh classloader. The session specific classloader is c1, 
but since it is a different thread and no classloader has been set on it, the 
thread will have the system classloader as its context classloader. Couldn't 
this cause potential CNF exceptions? If I understood correctly this problem 
also exists in the current implementation, doesn't it?
{quote}
I'm not sure if I understand the problem here - I believe that if a new thread 
handles an existing session, the thread calls Session.setCurrentSessionState() 
which should set the thread's context class loader to the SessionState's class 
loader. Or did I not understand the issue?

> ClassNotFoundException can possibly  occur if multiple jars are registered 
> one at a time in Hive
> ------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11878
>                 URL: https://issues.apache.org/jira/browse/HIVE-11878
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 1.2.1
>            Reporter: Ratandeep Ratti
>            Assignee: Ratandeep Ratti
>              Labels: URLClassLoader
>         Attachments: HIVE-11878.patch, HIVE-11878_approach3.patch, 
> HIVE-11878_approach3_per_session_clasloader.patch, HIVE-11878_qtest.patch
>
>
> When we register a jar on the Hive console. Hive creates a fresh URL 
> classloader which includes the path of the current jar to be registered and 
> all the jar paths of the parent classloader. The parent classlaoder is the 
> current ThreadContextClassLoader. Once the URLClassloader is created Hive 
> sets that as the current ThreadContextClassloader.
> So if we register multiple jars in Hive, there will be multiple 
> URLClassLoaders created, each classloader including the jars from its parent 
> and the one extra jar to be registered. The last URLClassLoader created will 
> end up as the current ThreadContextClassLoader. (See details: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath)
> Now here's an example in which the above strategy can lead to a CNF exception.
> We register 2 jars *j1* and *j2* in Hive console. *j1* contains the UDF class 
> *c1* and internally relies on class *c2* in jar *j2*. We register *j1* first, 
> the URLClassLoader *u1* is created and also set as the 
> ThreadContextClassLoader. We register *j2* next, the new URLClassLoader 
> created will be *u2* with *u1* as parent and *u2* becomes the new 
> ThreadContextClassLoader. Note *u2* includes paths to both jars *j1* and *j2* 
> whereas *u1* only has paths to *j1* (For details see: 
> org.apache.hadoop.hive.ql.exec.Utilities#addToClassPath).
> Now when we register class *c1* under a temporary function in Hive, we load 
> the class using {code} class.forName("c1", true, 
> Thread.currentThread().getContextClassLoader()) {code} . The 
> currentThreadContext class-loader is *u2*, and it has the path to the class 
> *c1*, but note that Class-loaders work by delegating to parent class-loader 
> first. In this case class *c1* will be found and *defined* by class-loader 
> *u1*.
> Now *c1* from jar *j1* has *u1* as its class-loader. If a method (say 
> initialize) is called in *c1*, which references the class *c2*, *c2* will not 
> be found since the class-loader used to search for *c2* will be *u1* (Since 
> the caller's class-loader is used to load a class)
> I've added a qtest to explain the problem. Please see the attached patch



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to