wuchong commented on PR #20003: URL: https://github.com/apache/flink/pull/20003#issuecomment-1161906232
> That depends on how the class-loading order is set up and how you actually use it. If you load everything parent-first within the added sub-tree this problem will not occur. If we force to use parent-first mode, then the classloader behavior is inconsistent between local job compiling and distributed. Say the user wants to use `child-first` for distributed execution to resolve class conflict between user jar and flink core jar. However, the job can't be compiled because the client forces to use parent-first and ignores users' `classloader.resolve-order` configuration and causes NoSuchMethod exceptions. > If we start removing URLs however this very much changes. Yes. But we don't and won't support removing URLs/JARs. > Can you clarify on whether the jars are accessed in between addUrl calls? Would it be technically feasible to first determine all the required jars before creating the first user CL? Ah, let me clarify the background of this pull request. The motivation is we would like to support `ADD JAR` and `CREATE FUNCTION ... USING JAR` ([FLIP-214](https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL)) statements in the table ecosystem, especially in SQL CLI and SQL Gateway. I will explain a use case of SQL Gateway + ADD JAR. A SQL Gateway (FLIP-91) is a long-running service that many users can connect to it via REST/JDBC/Beeline... Each user has a separate environment (e.g. classloader) and can submit SQL statements interactively. For example: ```sql -- start a SQL CLI and connects to the SQL Gateway which is serving at 10.10.11.2:8083 bin/sql-client.sh --endpoint 10.10.11.2:8083 -- A new session is opened for the current user, and a clean classloader is prepared for the user -- query is executed without additional user jars Flink SQL> SELECT * FROM T; -- the user jar is added to the user classloader of the current session Flink SQL> ADD JAR '/path/to/aaa.jar'; -- register a user-defined function (UDF) that is loaded from the previously added jaar Flink SQL> CREATE TEMPORARY FUNCTION lower AS 'org.apache.flink.udf.Lower'; -- query is executed with the added jar and the UDF in the jar. Flink SQL> SELECT id, lower(name) FROM T; -- SQL Gateway downloads the jar from HDFS to local disk, and add the jar to the user classloader of the current session. -- So the user classloader should contain both aaa.jar and bbb.jar -- And register the UDF that is loaded from bbb.jar Flink SQL> CREATE TEMPORARY FUNCTION upper AS 'me.wuchong.Upper' USING JAR 'hdfs:///path/to/bbb.jar'; -- query is executed with the added jars and the UDFs in the jar. Flink SQL> SELECT id, lower(name), upper(name) FROM T; -- session is closed, and the user classloader for this session is released in SQL Gateway Flink SQL>exist; ``` So, yes, the jars are accessed in between addUrl calls and we can't determine all the required jars before creating the first user CL. Because this is an interactive process, the jars are added dynamically, and we don't know what jars will be added at what time point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org