Hi Teng,

Thanks for the answer. I've switched to pandas during proof of concept
process in order to be able to plot graphs easily.

Actually, pandas DataFrame object itself has `plot` methods, so these
objects can plot themselves on most cases easily (it uses matplotlib
inside).

I wonder if spark DataFrame API would consider moving in that direction,
because plotting is really important during analysis process, and
converting data frame using `toPandas()` method would fail for data that do
not fit in memory.

Although I'm not much familiar with internals, I would like to help for
anything if team considers adding such a feature.

On Wed, Mar 23, 2016 at 2:16 PM Teng Qiu <teng...@gmail.com> wrote:

> e... then this sounds like a feature requirement for matplotlib, you
> need to make matplotlib's APIs support RDD or spark DataFrame object,
> i checked the API of mplot3d
> (
> http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter
> ),
> it only supports "array-like" input data.
>
> so yes, to use matplotlib, you need to take the elements out of RDD,
> and send them to plot API as list object.
>
> 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı <manuya...@gmail.com>:
> > Thanks for help, but the example that you referenced gets the values from
> > RDD as list and plots that list.
> >
> > What I am specifically asking was that is there a convenient way to plot
> a
> > DataFrame object directly?(like pandas DataFrame objects)
> >
> >
> > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu <teng...@gmail.com> wrote:
> >>
> >> not sure about 3d plot, but there is a nice example:
> >>
> >>
> https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb
> >>
> >> for plotting rdd or dataframe using matplotlib.
> >>
> >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı :
> >> > Hi all,
> >> > I'm trying to plot the result of a simple PCA operation, but couldn't
> >> > find a clear documentation about plotting data frames.
> >> > Here is the output of my data frame:
> >> > +----------------------------------------------------------------+
> >> > |pca_features                                                    |
> >> > +----------------------------------------------------------------+
> >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039]     |
> >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785]     |
> >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671]    |
> >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245]   |
> >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557]   |
> >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207]    |
> >> > Values of `pca_features` column is DenseVector s created using
> >> > VectorAssembler.
> >> > How can I draw a simple 3d scatter plot from this data frame?
> >> > Thanks
>

Reply via email to