Hi Andrey, To get rid of the lag the SVG images are causing in your notebook, you can hide the Paragraph output. Look for this icon in the upper right hand corner of the paragraph: https://puu.sh/rq0JU/6fa29f2ff9.png
For your first problem, matplotlib is very inflexible when it comes to setting the backend. The default backend on most systems is set to Qt4Agg which is a GUI backend and therefore requires DISPLAY to be set in your environment (eg through X11). Hence, you should always call matplotlib.use('Agg') before making calls (AND imports) to any other plotting functions. In fact it is good practice to this in your very first paragraph cell before running all others. plt.switch_backend() can work in certain circumstances but a safe bet is to restart the interpreter through the Interpreter menu in Zeppelin, then running the paragraphs again. If you don't want to do this for every notebook your best bet is to change the default backend to Agg in your matplotlibrc file. Part of the ongoing development work for Zeppelin will involve creating a custom matplotlib backend that is automatically defaulted so users won't have to worry about this stuff in the future. For the second problem, the PR that I linked you to has not been merged, and only works with the python (not pyspark) interpreter. You'll need directly define the show function yourself somewhere in your notebook. Hope this helps. Thanks, Alex On Tue, Sep 27, 2016 at 2:22 PM, Андрей Ривкин <amriv...@gmail.com> wrote: > Hi Alex, > > Thank you, we will give PNG a try. > > Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have 40 > 000 rows x 17 columns. Not so big. > > But it seems that 40k dots it too much for my browser. Also may be > Zeppelin should somehow disable such diffcult paragraphs and not whole > notebook. > And it's very difficult to change notebook after this plot was painted. Is > there any way to clean up all results of notebook before opening? > > If we do import os, then we get: > > Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in <module> > eval(compiledCode) > File "<string>", line 1, in <module> > File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py", > line 2951, in hist_series > plt.figure(figsize=figsize)) > File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", > line 527, in figure > **kwargs) > File > "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", > line 46, in new_figure_manager > return new_figure_manager_given_figure(num, thisFig) > File > "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", > line 53, in new_figure_manager_given_figure > canvas = FigureCanvasQTAgg(figure) > File > "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", > line 76, in __init__ > FigureCanvasQT.__init__(self, figure) > File > "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", > line 68, in __init__ > _create_qApp() > File > "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", > line 138, in _create_qApp > raise RuntimeError('Invalid DISPLAY variable') > RuntimeError: Invalid DISPLAY variable > > Also trying : > > > %pyspark > import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt > def show(p): z.show(plt, fmt="png") > > > %pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age') > plt.ylabel('Number of people') plt.title('Age distribution') show(plt) > plt.close() > > > Traceback (most recent call last): > File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in <module> > eval(compiledCode) > File "<string>", line 5, in <module> > File "<string>", line 5, in show > TypeError: show() got an unexpected keyword argument 'fmt' > > > > Regards, > Andrey > > > 2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) < > alexander.good...@jpl.nasa.gov>: > >> Hi Andrey, >> >> These two lines: >> >> os.system("export DISPLAY=:0") >> plt.switch_backend('Agg') >> >> should not be necessary since you have already set the backend manually >> to AGG. >> >> More importantly, how large is your dataset? While SVG looks nice, it >> does not scale well with large datasets. I would suggest you try using PNG >> images instead. Some code for doing this can be found in this PR: >> https://github.com/apache/zeppelin/pull/1422. There is work ongoing to >> improve matplotlib integration with zeppelin even further than this, but >> this solution should be sufficient for you right now. If you are still >> having problems, let us know. >> >> Thanks, >> Alex >> >> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com> >> wrote: >> >>> Hi Alex, >>> >>> here is exported notebook and sample data. >>> >>> Notebook is quite havy (19MB) where can I upload it? >>> >>> Here is some code sample: >>> >>> %pyspark >>> >>> import matplotlib >>> import os >>> >>> from pylab import figure, show, rand >>> from matplotlib.patches import Ellipse >>> import matplotlib.pyplot as plt >>> # helper function to display in Zeppelin >>> >>> matplotlib.use('Agg') >>> os.system("export DISPLAY=:0") >>> plt.switch_backend('Agg') >>> >>> import StringIO >>> def show(p): >>> img = StringIO.StringIO() >>> p.savefig(img, format='svg') >>> img.seek(0) >>> print "%html <div style='width:600px'>" + img.buf + "</div>" >>> >>> >>> >>> >>> >>> %pyspark >>> data_1 = data.ix[data['y']==1] >>> data_0 = data.ix[data['y']==0] >>> x_1 = data_1['balance'].values >>> y_1 = data_1['age'].values >>> x_0 = data_0['balance'].values >>> y_0 = data_0['age'].values >>> colors = ['red','green'] >>> plt.figure(figsize=(10, 6)) >>> plt.xlabel('Balance') >>> plt.ylabel('Age') >>> plt.title('') >>> plt.scatter(x_0, y_0, alpha=0.5, color='blue') >>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors >>> .ListedColormap(colors) >>> plt.title('Destributions of balance by age and target value') >>> show(plt) >>> plt.close() >>> >>> Regards, >>> Andrey >>> >>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) < >>> alexander.good...@jpl.nasa.gov>: >>> >>>> Hi Andrey, >>>> >>>> Would you be able to post the code you were using so we can try to >>>> reproduce your problem including how you are generating the images inline >>>> (eg, is your chosen image format png or svg?). >>>> >>>> Thanks, >>>> Alex >>>> >>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2. >>>>> >>>>> We have very simple demo and small file. If we want just to calculate >>>>> some - it's ok. >>>>> But when we try to visualize using matplotlib Zepplin hangs (even >>>>> scroll bar) and then disconnects. >>>>> >>>>> We are using Chrome. >>>>> >>>>> In logs just this: >>>>> >>>>> INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10} >>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283 >>>>> using sql org.apache.zeppelin.interpreter.LazyOpenInterpreter@49b54b66 >>>>> INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10} >>>>> NotebookServer.java[afterStatusChange]:1150) - Job >>>>> 20160920-143431_1028264283 is finished >>>>> INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10} >>>>> SchedulerFactory.java[jobFinished]:137) - Job >>>>> paragraph_1474371271306_-871438747 finished by scheduler >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar >>>>> ed_session476562431 >>>>> INFO [2016-09-20 18:47:19,430] ({qtp88558700-14} >>>>> NotebookServer.java[onClose]:227) - Closed connection to >>>>> 192.168.110.249 : 53565. (1001) null >>>>> >>>>> Always null and 1001 in the end. >>>>> >>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it >>>>> will hang too. >>>>> >>>>> How could we debug this? >>>>> >>>>> Regards, >>>>> Andrey >>>>> >>>>> >>>> >>>> >>>> -- >>>> Alex Goodman >>>> Data Scientist I >>>> Science Data Modeling and Computing (398K) >>>> Jet Propulsion Laboratory >>>> California Institute of Technology >>>> Tel: +1-818-354-6012 >>>> >>> >>> >> >> >> -- >> Alex Goodman >> Data Scientist I >> Science Data Modeling and Computing (398K) >> Jet Propulsion Laboratory >> California Institute of Technology >> Tel: +1-818-354-6012 >> > > -- Alex Goodman Data Scientist I Science Data Modeling and Computing (398K) Jet Propulsion Laboratory California Institute of Technology Tel: +1-818-354-6012