Hi Andrey,

Hmm. Usually if you wait long enough the notebook should eventually open.
You are right though, the only other way to fix this that I can think of is
to edit the note.json file directly and remove the output yourself (you'll
see it as a really long string contained in the SVG div tag). As far as I
know though, there isn't an option in the Zeppelin GUI to clear the output
in a specific note from the main menu, so that would be a nice feature. I
would consider filing a JIRA issue:
https://issues.apache.org/jira/browse/ZEPPELIN/

Thanks,
Alex

On Tue, Sep 27, 2016 at 3:10 PM, Андрей Ривкин <amriv...@gmail.com> wrote:

> Hi Alex,
>
> This helped! Great! Thank you!
>
> As for the option to hide the Paragraph output, there is a problem. If SVG
> is lagging u can't open notebook at all. So may be there is some way to
> clean all notebook output before opening it?
> Also we can change notebook json file directly on disk.
>
> Again, thank you for help.
>
> Regards,
> Andrey
>
>
>
>
>
> 2016-09-28 0:45 GMT+03:00 Goodman, Alexander (398K) <
> alexander.good...@jpl.nasa.gov>:
>
>> Hi Andrey,
>>
>> To get rid of the lag the SVG images are causing in your notebook, you
>> can hide the Paragraph output. Look for this icon in the upper right hand
>> corner of the paragraph: https://puu.sh/rq0JU/6fa29f2ff9.png
>>
>> For your first problem, matplotlib is very inflexible when it comes to
>> setting the backend. The default backend on most systems is set to Qt4Agg
>> which is a GUI backend and therefore requires DISPLAY to be set in your
>> environment (eg through X11). Hence, you should always call
>> matplotlib.use('Agg') before making calls (AND imports) to any other
>> plotting functions. In fact it is good practice to this in your very first
>> paragraph cell before running all others. plt.switch_backend() can work in
>> certain circumstances but a safe bet is to restart the interpreter through
>> the Interpreter menu in Zeppelin, then running the paragraphs again. If you
>> don't want to do this for every notebook your best bet is to change the
>> default backend to Agg in your matplotlibrc file. Part of the ongoing
>> development work for Zeppelin will involve creating a custom matplotlib
>> backend that is automatically defaulted so users won't have to worry about
>> this stuff in the future.
>>
>> For the second problem, the PR that I linked you to has not been merged,
>> and only works with the python (not pyspark) interpreter. You'll need
>> directly define the show function yourself somewhere in your notebook. Hope
>> this helps.
>>
>> Thanks,
>> Alex
>>
>> On Tue, Sep 27, 2016 at 2:22 PM, Андрей Ривкин <amriv...@gmail.com>
>> wrote:
>>
>>> Hi Alex,
>>>
>>> Thank you, we will give PNG a try.
>>>
>>> Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have
>>> 40 000 rows x 17 columns. Not so big.
>>>
>>> But it seems that 40k dots it too much for my browser. Also may be
>>> Zeppelin should somehow disable such diffcult paragraphs and not whole
>>> notebook.
>>> And it's very difficult to change notebook after this plot was painted.
>>> Is there any way to clean up all results of notebook before opening?
>>>
>>> If we do import os, then we get:
>>>
>>> Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in
>>> <module>
>>> eval(compiledCode)
>>> File "<string>", line 1, in <module>
>>> File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py",
>>> line 2951, in hist_series
>>> plt.figure(figsize=figsize))
>>> File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py",
>>> line 527, in figure
>>> **kwargs)
>>> File 
>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>> line 46, in new_figure_manager
>>> return new_figure_manager_given_figure(num, thisFig)
>>> File 
>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>> line 53, in new_figure_manager_given_figure
>>> canvas = FigureCanvasQTAgg(figure)
>>> File 
>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
>>> line 76, in __init__
>>> FigureCanvasQT.__init__(self, figure)
>>> File 
>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py",
>>> line 68, in __init__
>>> _create_qApp()
>>> File 
>>> "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py",
>>> line 138, in _create_qApp
>>> raise RuntimeError('Invalid DISPLAY variable')
>>> RuntimeError: Invalid DISPLAY variable
>>>
>>> Also trying :
>>>
>>>
>>> %pyspark
>>> import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt
>>> def show(p): z.show(plt, fmt="png")
>>>
>>>
>>> %pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age')
>>> plt.ylabel('Number of people') plt.title('Age distribution') show(plt)
>>> plt.close()
>>>
>>>
>>> Traceback (most recent call last):
>>> File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in
>>> <module>
>>> eval(compiledCode)
>>> File "<string>", line 5, in <module>
>>> File "<string>", line 5, in show
>>> TypeError: show() got an unexpected keyword argument 'fmt'
>>>
>>>
>>>
>>> Regards,
>>> Andrey
>>>
>>>
>>> 2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) <
>>> alexander.good...@jpl.nasa.gov>:
>>>
>>>> Hi Andrey,
>>>>
>>>> These two lines:
>>>>
>>>> os.system("export DISPLAY=:0")
>>>> plt.switch_backend('Agg')
>>>>
>>>> should not be necessary since you have already set the backend manually
>>>> to AGG.
>>>>
>>>> More importantly, how large is your dataset? While SVG looks nice, it
>>>> does not scale well with large datasets. I would suggest you try using PNG
>>>> images instead. Some code for doing this can be found in this PR:
>>>> https://github.com/apache/zeppelin/pull/1422. There is work ongoing to
>>>> improve matplotlib integration with zeppelin even further than this, but
>>>> this solution should be sufficient for you right now. If you are still
>>>> having problems, let us know.
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Alex,
>>>>>
>>>>> here is exported notebook and sample data.
>>>>>
>>>>> Notebook is quite havy (19MB) where can I upload it?
>>>>>
>>>>> Here is some code sample:
>>>>>
>>>>> %pyspark
>>>>>
>>>>> import matplotlib
>>>>> import os
>>>>>
>>>>> from pylab import figure, show, rand
>>>>> from matplotlib.patches import Ellipse
>>>>> import matplotlib.pyplot as plt
>>>>> # helper function to display in Zeppelin
>>>>>
>>>>> matplotlib.use('Agg')
>>>>> os.system("export DISPLAY=:0")
>>>>> plt.switch_backend('Agg')
>>>>>
>>>>> import StringIO
>>>>> def show(p):
>>>>>   img = StringIO.StringIO()
>>>>>   p.savefig(img, format='svg')
>>>>>   img.seek(0)
>>>>>   print "%html <div style='width:600px'>" + img.buf + "</div>"
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> %pyspark
>>>>> data_1 = data.ix[data['y']==1]
>>>>> data_0 = data.ix[data['y']==0]
>>>>> x_1 = data_1['balance'].values
>>>>> y_1 = data_1['age'].values
>>>>> x_0 = data_0['balance'].values
>>>>> y_0 = data_0['age'].values
>>>>> colors = ['red','green']
>>>>> plt.figure(figsize=(10, 6))
>>>>> plt.xlabel('Balance')
>>>>> plt.ylabel('Age')
>>>>> plt.title('')
>>>>> plt.scatter(x_0, y_0, alpha=0.5, color='blue')
>>>>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors
>>>>> .ListedColormap(colors)
>>>>> plt.title('Destributions of balance by age and target value')
>>>>> show(plt)
>>>>> plt.close()
>>>>>
>>>>> Regards,
>>>>> Andrey
>>>>>
>>>>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) <
>>>>> alexander.good...@jpl.nasa.gov>:
>>>>>
>>>>>> Hi Andrey,
>>>>>>
>>>>>> Would you be able to post the code you were using so we can try to
>>>>>> reproduce your problem including how you are generating the images inline
>>>>>> (eg, is your chosen image format png or svg?).
>>>>>>
>>>>>> Thanks,
>>>>>> Alex
>>>>>>
>>>>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2.
>>>>>>>
>>>>>>> We have very simple demo and small file. If we want just to
>>>>>>> calculate some - it's ok.
>>>>>>> But when we try to visualize using matplotlib Zepplin hangs (even
>>>>>>> scroll bar) and then disconnects.
>>>>>>>
>>>>>>> We are using Chrome.
>>>>>>>
>>>>>>> In logs just this:
>>>>>>>
>>>>>>>  INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10}
>>>>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283
>>>>>>> using sql org.apache.zeppelin.interprete
>>>>>>> r.LazyOpenInterpreter@49b54b66
>>>>>>>  INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10}
>>>>>>> NotebookServer.java[afterStatusChange]:1150) - Job
>>>>>>> 20160920-143431_1028264283 is finished
>>>>>>>  INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10}
>>>>>>> SchedulerFactory.java[jobFinished]:137) - Job
>>>>>>> paragraph_1474371271306_-871438747 finished by scheduler
>>>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar
>>>>>>> ed_session476562431
>>>>>>>  INFO [2016-09-20 18:47:19,430] ({qtp88558700-14}
>>>>>>> NotebookServer.java[onClose]:227) - Closed connection to
>>>>>>> 192.168.110.249 : 53565. (1001) null
>>>>>>>
>>>>>>> Always null and 1001 in the end.
>>>>>>>
>>>>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it
>>>>>>> will hang too.
>>>>>>>
>>>>>>> How could we debug this?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Alex Goodman
>>>>>> Data Scientist I
>>>>>> Science Data Modeling and Computing (398K)
>>>>>> Jet Propulsion Laboratory
>>>>>> California Institute of Technology
>>>>>> Tel: +1-818-354-6012
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Alex Goodman
>>>> Data Scientist I
>>>> Science Data Modeling and Computing (398K)
>>>> Jet Propulsion Laboratory
>>>> California Institute of Technology
>>>> Tel: +1-818-354-6012
>>>>
>>>
>>>
>>
>>
>> --
>> Alex Goodman
>> Data Scientist I
>> Science Data Modeling and Computing (398K)
>> Jet Propulsion Laboratory
>> California Institute of Technology
>> Tel: +1-818-354-6012
>>
>
>


-- 
Alex Goodman
Data Scientist I
Science Data Modeling and Computing (398K)
Jet Propulsion Laboratory
California Institute of Technology
Tel: +1-818-354-6012

Reply via email to