HI: In order to more intuitively express the actual use of distributed zeppelin clusters. I updated this design document, starting with the 16th page of the document, adding 2 GIF animations showing the operation record screen of the zeppelin cluster we are using now. https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
Distributed clustered zeppelin is already in use at our company, and the recorded screens are all real. The first recorded screens GIF shows the following Create a cluster of three zeppelin servers Add 234, 235, 236 to the zeppelin.cluster.addr attribute in zeppelin-site.xml to create a cluster Start these 3 servers at the same time Open the web pages of these 3 servers and prepare for the notebook operation. The second recorded screens GIF shows the following Create an interpreter process in the cluster Create a notebook on host234 and execute it, This action will create an interpreter process in the server with free resources in the cluster. You can then continue editing this notebook on host235 and execute it, You can return results immediately without waiting for the time to create an interpreter process. Again, you can continue to edit this notebook on host236. And execute it, you can return results immediately without waiting for the time to create the interpreter process The same notebook will reuse the first created interpreter process, so you can get the execution result immediately on any server. By looking at the background server process, you will find that host234, host235, and host235 use the same interpreter process for the same notebook. Originally, I wanted to record the interpreter process exception. The cluster re-created the screenshot of the interpreter process in the idle server, but I am too tired now. There is time to record later. > 在 2018年7月19日,上午7:36,Ruslan Dautkhanov <dautkha...@gmail.com> 写道: > > Thank you luxun, > > I left a couple of comments in that google document. > > -- > Ruslan Dautkhanov > > > On Tue, Jul 17, 2018 at 11:30 PM liuxun <neliu...@163.com > <mailto:neliu...@163.com>> wrote: > hi,Ruslan Dautkhanov > > Thank you very much for your question. according to your advice, I added 3 > schematics to illustrate. > 1. Distributed Zeppelin Deployment architecture diagram. > 2. Distributed zeppelin Server fault tolerance diagram. > 3. Distributed zeppelin Server & intp process fault tolerance diagram. > > > The email attachment exceeded the size limit, so I reorganized the document > and updated it with Google Docs. > https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing > > <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing> > > >> 在 2018年7月18日,下午1:03,liuxun <neliu...@163.com <mailto:neliu...@163.com>> 写道: >> >> hi,Ruslan Dautkhanov >> >> Thank you very much for your question. according to your advice, I added 3 >> schematics to illustrate. >> 1. Zeppelin Cluster architecture diagram. >> 2. Distributed zeppelin Server fault tolerance diagram. >> 3. Distributed zeppelin Server & intp process fault tolerance diagram. >> >> Later, I will merge the schematic into the system design document. >> >> <Zeppelin system architecture diagram00.png> >> >> >> <Distributed zeppelin Server fault tolerance diagram 1.png> >> >> >> >> <Distributed zeppelin Server fault tolerance diagram 2.png> >> >> >> >>> 在 2018年7月18日,上午1:16,Ruslan Dautkhanov <dautkha...@gmail.com >>> <mailto:dautkha...@gmail.com>> 写道: >>> >>> Nice. >>> >>> Thanks for sharing. >>> >>> Can you explain how are users routed into a particular zeppelin server >>> instance? I've seen nginx on top of them, but I don't think the document >>> covers details? If one zeppelin server goes down or unhealthy, is nginx >>> supposed to detect (if so, how?) that and reroute users to a survived >>> instance? >>> >>> Thanks, >>> Ruslan Dautkhanov >>> >>> >>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <neliu...@163.com >>> <mailto:neliu...@163.com>> wrote: >>> >>>> hi: >>>> >>>> Our company installed and deployed a lot of zeppelin for data analysis. >>>> The single server version of zeppelin could not meet our application >>>> scenarios, so we transformed zeppelin into a clustered service that >>>> supports distributed deployment, Have a unified entrance, high >>>> availability, and High server resource usage. the email attachment is the >>>> entire design document, I am very happy to feedback our modified code back >>>> to the community. >>>> >>>> >>>> this is the JIRA I submitted in the community, >>>> >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3471> >>>> >>>> >>>> Since the design document size exceeds the mail attachment size limit, the >>>> document link address has to be sent. >>>> >>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf >>>> >>>> <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%20design.pdf> >>>> >>>> https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png >>>> >>>> <https://issues.apache.org/jira/secure/attachment/12931895/zepplin%20Cluster%20Sequence%20Diagram.png> >>>> >>>> >>>> liuxun >>>> >> >