Hi Alex,

Thank you, we will give PNG a try.

Our dataset is very small (for Big Data and Hadoop) - only 4mb. We have 40
000 rows x 17 columns. Not so big.

But it seems that 40k dots it too much for my browser. Also may be Zeppelin
should somehow disable such diffcult paragraphs and not whole notebook.
And it's very difficult to change notebook after this plot was painted. Is
there any way to clean up all results of notebook before opening?

If we do import os, then we get:

Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-3283164060812521118.py", line 239, in <module>
eval(compiledCode)
File "<string>", line 1, in <module>
File "/opt/anaconda2/lib/python2.7/site-packages/pandas/tools/plotting.py",
line 2951, in hist_series
plt.figure(figsize=figsize))
File "/opt/anaconda2/lib/python2.7/site-packages/matplotlib/pyplot.py",
line 527, in figure
**kwargs)
File
"/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
line 46, in new_figure_manager
return new_figure_manager_given_figure(num, thisFig)
File
"/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
line 53, in new_figure_manager_given_figure
canvas = FigureCanvasQTAgg(figure)
File
"/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4agg.py",
line 76, in __init__
FigureCanvasQT.__init__(self, figure)
File
"/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt4.py",
line 68, in __init__
_create_qApp()
File
"/opt/anaconda2/lib/python2.7/site-packages/matplotlib/backends/backend_qt5.py",
line 138, in _create_qApp
raise RuntimeError('Invalid DISPLAY variable')
RuntimeError: Invalid DISPLAY variable

Also trying :


%pyspark
import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt
def show(p): z.show(plt, fmt="png")


%pyspark raw_data['age'].hist(bins=20,color = 'g') plt.xlabel('Age')
plt.ylabel('Number of people') plt.title('Age distribution') show(plt)
plt.close()


Traceback (most recent call last):
File "/tmp/zeppelin_pyspark-7759686698822330483.py", line 239, in <module>
eval(compiledCode)
File "<string>", line 5, in <module>
File "<string>", line 5, in show
TypeError: show() got an unexpected keyword argument 'fmt'



Regards,
Andrey


2016-09-27 20:53 GMT+03:00 Goodman, Alexander (398K) <
alexander.good...@jpl.nasa.gov>:

> Hi Andrey,
>
> These two lines:
>
> os.system("export DISPLAY=:0")
> plt.switch_backend('Agg')
>
> should not be necessary since you have already set the backend manually to
> AGG.
>
> More importantly, how large is your dataset? While SVG looks nice, it does
> not scale well with large datasets. I would suggest you try using PNG
> images instead. Some code for doing this can be found in this PR:
> https://github.com/apache/zeppelin/pull/1422. There is work ongoing to
> improve matplotlib integration with zeppelin even further than this, but
> this solution should be sufficient for you right now. If you are still
> having problems, let us know.
>
> Thanks,
> Alex
>
> On Tue, Sep 27, 2016 at 3:27 AM, Андрей Ривкин <amriv...@gmail.com> wrote:
>
>> Hi Alex,
>>
>> here is exported notebook and sample data.
>>
>> Notebook is quite havy (19MB) where can I upload it?
>>
>> Here is some code sample:
>>
>> %pyspark
>>
>> import matplotlib
>> import os
>>
>> from pylab import figure, show, rand
>> from matplotlib.patches import Ellipse
>> import matplotlib.pyplot as plt
>> # helper function to display in Zeppelin
>>
>> matplotlib.use('Agg')
>> os.system("export DISPLAY=:0")
>> plt.switch_backend('Agg')
>>
>> import StringIO
>> def show(p):
>>   img = StringIO.StringIO()
>>   p.savefig(img, format='svg')
>>   img.seek(0)
>>   print "%html <div style='width:600px'>" + img.buf + "</div>"
>>
>>
>>
>>
>>
>> %pyspark
>> data_1 = data.ix[data['y']==1]
>> data_0 = data.ix[data['y']==0]
>> x_1 = data_1['balance'].values
>> y_1 = data_1['age'].values
>> x_0 = data_0['balance'].values
>> y_0 = data_0['age'].values
>> colors = ['red','green']
>> plt.figure(figsize=(10, 6))
>> plt.xlabel('Balance')
>> plt.ylabel('Age')
>> plt.title('')
>> plt.scatter(x_0, y_0, alpha=0.5, color='blue')
>> plt.scatter(x_1, y_1, alpha=0.5, color='red')#matplotlib.colors
>> .ListedColormap(colors)
>> plt.title('Destributions of balance by age and target value')
>> show(plt)
>> plt.close()
>>
>> Regards,
>> Andrey
>>
>> 2016-09-20 19:20 GMT+03:00 Goodman, Alexander (398K) <
>> alexander.good...@jpl.nasa.gov>:
>>
>>> Hi Andrey,
>>>
>>> Would you be able to post the code you were using so we can try to
>>> reproduce your problem including how you are generating the images inline
>>> (eg, is your chosen image format png or svg?).
>>>
>>> Thanks,
>>> Alex
>>>
>>> On Tue, Sep 20, 2016 at 9:12 AM, Андрей Ривкин <amriv...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> We are using Zeppelin 0.6.1 with Spark 1.6.2.
>>>>
>>>> We have very simple demo and small file. If we want just to calculate
>>>> some - it's ok.
>>>> But when we try to visualize using matplotlib Zepplin hangs (even
>>>> scroll bar) and then disconnects.
>>>>
>>>> We are using Chrome.
>>>>
>>>> In logs just this:
>>>>
>>>>  INFO [2016-09-20 18:44:22,574] ({pool-1-thread-10}
>>>> Paragraph.java[jobRun]:252) - run paragraph 20160920-143431_1028264283
>>>> using sql org.apache.zeppelin.interpreter.LazyOpenInterpreter@49b54b66
>>>>  INFO [2016-09-20 18:44:25,114] ({pool-1-thread-10}
>>>> NotebookServer.java[afterStatusChange]:1150) - Job
>>>> 20160920-143431_1028264283 is finished
>>>>  INFO [2016-09-20 18:44:25,142] ({pool-1-thread-10}
>>>> SchedulerFactory.java[jobFinished]:137) - Job
>>>> paragraph_1474371271306_-871438747 finished by scheduler
>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpretershar
>>>> ed_session476562431
>>>>  INFO [2016-09-20 18:47:19,430] ({qtp88558700-14}
>>>> NotebookServer.java[onClose]:227) - Closed connection to
>>>> 192.168.110.249 : 53565. (1001) null
>>>>
>>>> Always null and 1001 in the end.
>>>>
>>>> In Firefox it's sometimes ok. But if there are more then 3 plots it
>>>> will hang too.
>>>>
>>>> How could we debug this?
>>>>
>>>> Regards,
>>>> Andrey
>>>>
>>>>
>>>
>>>
>>> --
>>> Alex Goodman
>>> Data Scientist I
>>> Science Data Modeling and Computing (398K)
>>> Jet Propulsion Laboratory
>>> California Institute of Technology
>>> Tel: +1-818-354-6012
>>>
>>
>>
>
>
> --
> Alex Goodman
> Data Scientist I
> Science Data Modeling and Computing (398K)
> Jet Propulsion Laboratory
> California Institute of Technology
> Tel: +1-818-354-6012
>

Reply via email to