Hi Andrey, Hmm. Usually if you wait long enough the notebook should eventually open. You are right though, the only other way to fix this that I can think of is to edit the note.json file directly and remove the output yourself (you'll see it as a really long string contained in the SVG div tag). As far as I know though, there isn't an option in the Zeppelin GUI to clear the output in a specific note from the main menu, so that would be a nice feature. I would consider filing a JIRA issue: https://issues.apache.org/jira/browse/ZEPPELIN/
Thanks, Alex On Tue, Sep 27, 2016 at 3:10 PM, Андрей Ривкин <amriv...@gmail.com> wrote: > Hi Alex, > > This helped! Great! Thank you! > > As for the option to hide the Paragraph output, there is a problem. If SVG > is lagging u can't open notebook at all. So may be there is some way to > clean all notebook output before opening it? > Also we can change notebook json file directly on disk. > > Again, thank you for help. > > Regards, > Andrey > > > > > > 2016-09-28 0:45 GMT+03:00 Goodman, Alexander (398K) < > alexander.good...@jpl.nasa.gov>: > >> Hi Andrey, >> >> To get rid of the lag the SVG images are causing in your notebook, you >> can hide the Paragraph output. Look for this icon in the upper right hand >> corner of the paragraph: https://puu.sh/rq0JU/6fa29f2ff9.png >> >> For your first problem, matplotlib is very inflexible when it comes to >> setting the backend. The default backend on most systems is set to Qt4Agg >> which is a GUI backend and therefore requires DISPLAY to be set in your >> environment (eg through X11). Hence, you should always call >> matplotlib.use('Agg') before making calls (AND imports) to any other >> plotting functions. In fact it is good practice to this in your very first >> paragraph cell before running all others. plt.switch_backend() can work in >> certain circumstances but a safe bet is to restart the interpreter through >> the Interpreter menu in Zeppelin, then running the paragraphs again. If you >> don't want to do this for every notebook your best bet is to change the >> default backend to Agg in your matplotlibrc file. Part of the ongoing >> development work for Zeppelin will involve creating a custom matplotlib >> backend that is automatically defaulted so users won't have to worry about >> this stuff in the future. >> >> For the second problem, the PR that I linked you to has not been merged, >> and only works with the python (not pyspark) interpreter. You'll need >> directly define the show function yourself somewhere in your notebook. Hope >> this helps. >> >> Thanks, >> Alex >> >> On Tue, Sep 27, 2016 at 2:22 PM, Андрей Ривкин <amriv...@gmail.com> >> wrote: >> >>> Hi Alex, >>> >>> Thank you, we will give PNG a try. >>> >>> Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have >>> 40 000 rows x 17 columns. Not so big. >>> >>> But it seems that 40k dots it too much for my browser. Also may be >>> Zeppelin should somehow disable such diffcult paragraphs and not whole >>> notebook. >>> And it's very difficult to change notebook after this plot was painted. >>> Is there any way to clean up all results of notebook before opening? >>> >>> If we do import os, then we get: >>> >>> Traceback (most recent call last): >>> File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in >>> <module> >>> eval(compiledCode) >>> File "<string>", line 1, in <module> >>> File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py", >>> line 2951, in hist_series >>> plt.figure(figsize=figsize)) >>> File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py", >>> line 527, in figure >>> **kwargs) >>> File >>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>> line 46, in new_figure_manager >>> return new_figure_manager_given_figure(num, thisFig) >>> File >>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>> line 53, in new_figure_manager_given_figure >>> canvas = FigureCanvasQTAgg(figure) >>> File >>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py", >>> line 76, in __init__ >>> FigureCanvasQT.__init__(self, figure) >>> File >>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py", >>> line 68, in __init__ >>> _create_qApp() >>> File >>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py", >>> line 138, in _create_qApp >>> raise RuntimeError('Invalid DISPLAY variable') >>> RuntimeError: Invalid DISPLAY variable >>> >>> Also trying : >>> >>> >>> %pyspark >>> import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt >>> def show(p): z.show(plt, fmt="png") >>> >>> >>> %pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age') >>> plt.ylabel('Number of people') plt.title('Age distribution') show(plt) >>> plt.close() >>> >>> >>> Traceback (most recent call last): >>> File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in >>> <module> >>> eval(compiledCode) >>> File "<string>", line 5, in <module> >>> File "<string>", line 5, in show >>> TypeError: show() got an unexpected keyword argument 'fmt' >>> >>> >>> >>> Regards, >>> Andrey >>> >>> >>> 2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) < >>> alexander.good...@jpl.nasa.gov>: >>> >>>> Hi Andrey, >>>> >>>> These two lines: >>>> >>>> os.system("export DISPLAY=:0") >>>> plt.switch_backend('Agg') >>>> >>>> should not be necessary since you have already set the backend manually >>>> to AGG. >>>> >>>> More importantly, how large is your dataset? While SVG looks nice, it >>>> does not scale well with large datasets. I would suggest you try using PNG >>>> images instead. Some code for doing this can be found in this PR: >>>> https://github.com/apache/zeppelin/pull/1422. There is work ongoing to >>>> improve matplotlib integration with zeppelin even further than this, but >>>> this solution should be sufficient for you right now. If you are still >>>> having problems, let us know. >>>> >>>> Thanks, >>>> Alex >>>> >>>> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com> >>>> wrote: >>>> >>>>> Hi Alex, >>>>> >>>>> here is exported notebook and sample data. >>>>> >>>>> Notebook is quite havy (19MB) where can I upload it? >>>>> >>>>> Here is some code sample: >>>>> >>>>> %pyspark >>>>> >>>>> import matplotlib >>>>> import os >>>>> >>>>> from pylab import figure, show, rand >>>>> from matplotlib.patches import Ellipse >>>>> import matplotlib.pyplot as plt >>>>> # helper function to display in Zeppelin >>>>> >>>>> matplotlib.use('Agg') >>>>> os.system("export DISPLAY=:0") >>>>> plt.switch_backend('Agg') >>>>> >>>>> import StringIO >>>>> def show(p): >>>>> img = StringIO.StringIO() >>>>> p.savefig(img, format='svg') >>>>> img.seek(0) >>>>> print "%html <div style='width:600px'>" + img.buf + "</div>" >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> %pyspark >>>>> data_1 = data.ix[data['y']==1] >>>>> data_0 = data.ix[data['y']==0] >>>>> x_1 = data_1['balance'].values >>>>> y_1 = data_1['age'].values >>>>> x_0 = data_0['balance'].values >>>>> y_0 = data_0['age'].values >>>>> colors = ['red','green'] >>>>> plt.figure(figsize=(10, 6)) >>>>> plt.xlabel('Balance') >>>>> plt.ylabel('Age') >>>>> plt.title('') >>>>> plt.scatter(x_0, y_0, alpha=0.5, color='blue') >>>>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors >>>>> .ListedColormap(colors) >>>>> plt.title('Destributions of balance by age and target value') >>>>> show(plt) >>>>> plt.close() >>>>> >>>>> Regards, >>>>> Andrey >>>>> >>>>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) < >>>>> alexander.good...@jpl.nasa.gov>: >>>>> >>>>>> Hi Andrey, >>>>>> >>>>>> Would you be able to post the code you were using so we can try to >>>>>> reproduce your problem including how you are generating the images inline >>>>>> (eg, is your chosen image format png or svg?). >>>>>> >>>>>> Thanks, >>>>>> Alex >>>>>> >>>>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2. >>>>>>> >>>>>>> We have very simple demo and small file. If we want just to >>>>>>> calculate some - it's ok. >>>>>>> But when we try to visualize using matplotlib Zepplin hangs (even >>>>>>> scroll bar) and then disconnects. >>>>>>> >>>>>>> We are using Chrome. >>>>>>> >>>>>>> In logs just this: >>>>>>> >>>>>>> INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10} >>>>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283 >>>>>>> using sql org.apache.zeppelin.interprete >>>>>>> r.LazyOpenInterpreter@49b54b66 >>>>>>> INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10} >>>>>>> NotebookServer.java[afterStatusChange]:1150) - Job >>>>>>> 20160920-143431_1028264283 is finished >>>>>>> INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10} >>>>>>> SchedulerFactory.java[jobFinished]:137) - Job >>>>>>> paragraph_1474371271306_-871438747 finished by scheduler >>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar >>>>>>> ed_session476562431 >>>>>>> INFO [2016-09-20 18:47:19,430] ({qtp88558700-14} >>>>>>> NotebookServer.java[onClose]:227) - Closed connection to >>>>>>> 192.168.110.249 : 53565. (1001) null >>>>>>> >>>>>>> Always null and 1001 in the end. >>>>>>> >>>>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it >>>>>>> will hang too. >>>>>>> >>>>>>> How could we debug this? >>>>>>> >>>>>>> Regards, >>>>>>> Andrey >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Alex Goodman >>>>>> Data Scientist I >>>>>> Science Data Modeling and Computing (398K) >>>>>> Jet Propulsion Laboratory >>>>>> California Institute of Technology >>>>>> Tel: +1-818-354-6012 >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Alex Goodman >>>> Data Scientist I >>>> Science Data Modeling and Computing (398K) >>>> Jet Propulsion Laboratory >>>> California Institute of Technology >>>> Tel: +1-818-354-6012 >>>> >>> >>> >> >> >> -- >> Alex Goodman >> Data Scientist I >> Science Data Modeling and Computing (398K) >> Jet Propulsion Laboratory >> California Institute of Technology >> Tel: +1-818-354-6012 >> > > -- Alex Goodman Data Scientist I Science Data Modeling and Computing (398K) Jet Propulsion Laboratory California Institute of Technology Tel: +1-818-354-6012