Hello, dev!

I believe it's common case when analyst works with data and wants to create 
temporal intermediate tables.
In zeppelin the analyst can clone note, make small changes in one note and 
schedule these two notes simultaneously.
Therefore it would be cool that temporal tables will be isolated from any 
analysts and from other sessions of this analyst.

In our company Zeppelin queries data in Greenplum Database [1].

We made session isolation in Greenplum by this way:
1. The interpreter "%greenplum" use login `zeppelin` to query data.
2. User `zeppelin` in database can't create or drop any schema, but `zeppelin` 
can call function `init_zeppelin_schema`.
3. The function `init_zeppelin_schema` have two arguments: user_nm and note_id.
The function checks if schema "zeppelin_{user_nm}_{note_id}" exists and if not 
than create it.
The trick of security bypass is possible due statement `SECURITY DEFINER` in 
the function [2].
4. Next step is assigning temporal schema to default by the query:
'set search_path TO zeppelin_{user_nm}_{note_id};'
5. After that analyst can write in the paragraph "create table super_tbl as 
....; select * from super_tbl;", that is table without schema.
And table will be created in temporal schema.
6. The system process drops all "zeppelin%" schemas in the Greenplum every 
night.

This approach give us the full isolation of temporal object in DB.
Analysts run note and are safe about the collision in reading/writing in the 
same temporal tables.
If analysts team is five people than they have high skills and one can learn 
them to run `init_zeppelin_schema` in each paragraph/note.
If analysts team is about fifty people it's hard to learn and control running 
`init_zeppelin_schema`.
Therefore we made isolation on the backend: temporal schema is called from JDBC 
interpreter. Our users just run business queries and don't know about 
`init_zeppelin_schema`.

It seems these is no way to make such isolation only Zeppelin side, it's need 
to "integrate" Zeppelin and DB.

Is this backend session isolation concept is mature for the next Zeppelin 
release?
Current master branch doesn't allow to make isolation. If your team want for 
this feature please answer "+1".

I feel the use of JDBC interpreter could be wider.
The users must wait for several minutes for metadata query before their query 
start in current stable release 0.7.3. [3-4].
Therefore many users are just on the way to using JDBC interpreter.


1. http://greenplum.org/
2. http://gpdb.docs.pivotal.io/43180/ref_guide/sql_commands/CREATE_FUNCTION.html
3. 
https://stackoverflow.com/questions/47722083/apache-zeppelin-0-7-3-oracle-jdbc-issues-long-running-query
4. https://issues.apache.org/jira/browse/ZEPPELIN-3110


Thanks,

Maksim Belousov


Reply via email to