Hi All I am writing this in order to get a fair understanding of how zeppelin can be integrated with Spark.
Our use case is to load few tables from a DB to Spark, run some transformation. Once done, we want to expose data through Zeppelin for analytics. I have few question around that to sound off any gross architectural flaws. Questions: 1. How Zeppelin connects to Spark? Thriftserver? Thrift JDBC? 2. What is the scope of Spark application when it is used from Zeppelin? For example, if I have few subsequent actions in zeppelin like map,filter,reduceByKey, filter,collect. I assume this will translate to an application and get submitted to Spark. However, If I want to use reuse some part of the data (for example) after first map transformation in earlier application. Can I do it? Or will it be another application and another spark submit? In our use case data will already be loaded in RDDs. So how Zeppelin can access it? 3. How can I control access on specific rdds to specific users in Zeppelin (assuming we have implemented some way of login mechanism in Zeppelin and we have a mapping between Zeppelin users and their LDAP accounts). Is it even possible? 4. If Zeppelin is not a good choice, yet, for the use case, what are the other alternatives? appreciate any help/pointers/guidance. -- Best Regards, Ayan Guha