We are using plotly for charts quite often (plotly python in conjunction with 
pyspark) and it reavels a weakness regarding to third party JS library 
integration.

Unfortunately current plotly integration is not very efficient in terms of 
library integration, which leads to huge notebooks. This is because notebooks 
that contain plotly will have to output the full js code of plotly. (~3MB). 
When a Notebook contains several of that integration, the notebook becomes very 
slow. I think this also applies to other js libraries.

E.g. for plotly you can render charts in zeppelin the following way with 
pyspark (the first print is only for demoing the actual html output in 
cleartext):
---------------------------------------------
%pyspark
from plotly import graph_objs as go
from plotly.offline import plot

out = plot([go.Scatter(x=[1, 2, 3], y=[3, 1, 6])],include_plotlyjs=True, 
output_type='div')

print(out)
print('%html', out)
---------------------------------------------

An alternative would be, to load plotly.js (or any other third party library) 
from a CDN, e.g.
 <script src="https://cdn.plot.ly/plotly-latest.min.js";></script> (and settings 
include_plotlyjs in the example above to false, which would lead to the bare 
minimum of code for that specific chart).
But this needs to be added to the html->head, otherwise any dependent code 
using that library might run into 'Uncaught ReferenceError: ... is not defined' 
issues, due to timing.

I found that setup guide for repacking the zeppelin-web archive, to include a 
custom library in the index.html, but IMHO that seems to be a workaround:
https://github.com/beljun/zeppelin-plotly

Therefore I would suggest to add some zeppelin server wide configuration to 
allow zeppelin administrators to include additional <script> directives in the 
head section.

Any other recommendation / ideas to solve the timing issue?

Reply via email to