Hi All

I am writing this in order to get a fair understanding of how zeppelin can
be integrated with Spark.

Our use case is to load few tables from a DB to Spark, run some
transformation. Once done, we want to expose data through Zeppelin for
analytics. I have few question around that to sound off any gross
architectural flaws.

Questions:

1. How Zeppelin connects to Spark? Thriftserver? Thrift JDBC?

2. What is the scope of Spark application when it is used from Zeppelin?
For example, if I have few subsequent actions in zeppelin like
map,filter,reduceByKey, filter,collect. I assume this will translate to an
application and get submitted to Spark. However, If I want to use reuse
some part of the data (for example) after first map transformation in
earlier application. Can I do it? Or will it be another application and
another spark submit?

 In our use case data will already be loaded in RDDs. So how Zeppelin can
access it?

3. How can I control access on specific rdds to specific users in Zeppelin
(assuming we have implemented some way of login mechanism in Zeppelin and
we have a mapping between Zeppelin users and their LDAP accounts). Is it
even possible?

4. If Zeppelin is not a good choice, yet, for the use case, what are the
other alternatives?

appreciate any help/pointers/guidance.


-- 
Best Regards,
Ayan Guha

Reply via email to