wuchong commented on PR #20003:
URL: https://github.com/apache/flink/pull/20003#issuecomment-1161906232

   > That depends on how the class-loading order is set up and how you actually 
use it.
   If you load everything parent-first within the added sub-tree this problem 
will not occur.
   
   If we force to use parent-first mode, then the classloader behavior is 
inconsistent between local job compiling and distributed. Say the user wants to 
use `child-first` for distributed execution to resolve class conflict between 
user jar and flink core jar. However, the job can't be compiled because the 
client forces to use parent-first and ignores users' 
`classloader.resolve-order` configuration and causes NoSuchMethod exceptions.
   
   > If we start removing URLs however this very much changes.
   Yes. But we don't and won't support removing URLs/JARs. 
   
   > Can you clarify on whether the jars are accessed in between addUrl calls?
   Would it be technically feasible to first determine all the required jars 
before creating the first user CL?
   
   Ah, let me clarify the background of this pull request. The motivation is we 
would like to support `ADD JAR` and `CREATE FUNCTION ... USING JAR` 
([FLIP-214](https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL))
 statements in the table ecosystem, especially in SQL CLI and SQL Gateway. I 
will explain a use case of SQL Gateway + ADD JAR. A SQL Gateway (FLIP-91) is a 
long-running service that many users can connect to it via REST/JDBC/Beeline... 
Each user has a separate environment (e.g. classloader) and can submit SQL 
statements interactively. For example: 
   
   ```sql
   -- start a SQL CLI and connects to the SQL Gateway which is serving at 
10.10.11.2:8083 
   bin/sql-client.sh --endpoint 10.10.11.2:8083 
   
   -- A new session is opened for the current user, and a clean classloader is 
prepared for the user
   
   -- query is executed without additional user jars
   Flink SQL> SELECT * FROM T;
   
   -- the user jar is added to the user classloader of the current session 
   Flink SQL> ADD JAR '/path/to/aaa.jar';
   
   -- register a user-defined function (UDF) that is loaded from the previously 
added jaar
   Flink SQL> CREATE TEMPORARY FUNCTION lower AS 'org.apache.flink.udf.Lower';
   
   -- query is executed with the added jar and the UDF in the jar.  
   Flink SQL> SELECT id, lower(name) FROM T;
   
   -- SQL Gateway downloads the jar from HDFS to local disk, and add the jar to 
the user classloader of the current session.
   -- So the user classloader should contain both aaa.jar and bbb.jar
   -- And register the UDF that is loaded from bbb.jar
   Flink SQL> CREATE TEMPORARY FUNCTION upper AS 'me.wuchong.Upper' USING JAR 
'hdfs:///path/to/bbb.jar';
   
   -- query is executed with the added jars and the UDFs in the jar.  
   Flink SQL> SELECT id, lower(name), upper(name) FROM T;
   
   -- session is closed, and the user classloader for this session is released 
in SQL Gateway 
   Flink SQL>exist;
   ```
   
   So, yes, the jars are accessed in between addUrl calls and we can't 
determine all the required jars before creating the first user CL. Because this 
is an interactive process, the jars are added dynamically, and we don't know 
what jars will be added at what time point. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to