zkytech opened a new pull request, #4423: URL: https://github.com/apache/zeppelin/pull/4423
### What is this PR for? Add **cross datasource query** support to Spark SQL interpreter, currently support cross datasource query with these datasources: - hive - jdbc - mongodb #### How to Use 1. User should declare a cross datasource table in format: `interpreterName.databaseName.tableName`. 2. `interpreterName` should exists in zeppelin interpreter configuration. 3. For JDBC datasource, jdbc driver jars should be included in dependencies. ### What type of PR is it? Feature ### Need Help #### 1. Is there a better way to load all interpreter settings: Currently inplement by reading interpreter settings list inside zengine module and pass this list to Spark SQL interpreter. So this pull request include 2 modules: 1. `zeppelin-zengine`: read and pass all interpreter settings to Spark SQL interpreter 2. `spark-interpreter`: add Spark SQL cross datasource query When spark is launch with `local` or `yarn-client` mode, it is easy to load interpreter settings list inside `spark-interpreter` and we do not need to make a change to `zeppelin-zengine`, but when you luanch spark interpreter in `yarn-cluster` mode, `interpreter.json` do not exists in yarn-cluster driver node, so you can not get interpreter settings. So I made a change to `zeppelin-zengine` to read and pass all interpreter settings to Spark SQL interpreter, and this works in `yarn-cluster` mode. I think it is not good to make change to zengine, is there a better way to get all interpreter settings in `yarn-cluster` mode without make change to `zengine` ? #### 2. How to distinguish between user and role in `option.owners` field of interpreter setting? I cannot distinguish user and role inside `option.owners` field of interpreter setting and datasource authorization check is implemented with these code: ```java HashSet<String> usersAndRoles = new HashSet<>(authenticationInfo.getUsersAndRoles()); HashSet<String> owners = new HashSet<>(iSetting.option.owners); // if owners is empty, means all users can access if(!owners.isEmpty()){ int size1 = owners.size(); owners.retainAll(usersAndRoles); int size2 = owners.size(); if(size1 == size2){ // no user or role match throw new InvalidCredentialsException(String.format(String.format("user %s has not privilege to access interpreter %s",authenticationInfo.getUser(), interpreterId))); } } ``` If there is any security concern, how can I make a better authentication check ? ### What is the Jira issue? [ZEPPELIN-5781] ### How should this be tested? 1. make sure sparkSQL-interepreter(`%sql`) works 2. config a jdbc / mongodb interpreter with name `interpreter-nameX` 3. test query jdbc/mongodb in %sql: ```sql %sql select * from interpreter-nameX.databaseName.tableName; ``` ### Screenshots (if appropriate)  ### Questions: * Does the licenses files need to update? no * Is there breaking changes for older versions? yes * Does this needs documentation? yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@zeppelin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org