Hi Teng, Thanks for the answer. I've switched to pandas during proof of concept process in order to be able to plot graphs easily.
Actually, pandas DataFrame object itself has `plot` methods, so these objects can plot themselves on most cases easily (it uses matplotlib inside). I wonder if spark DataFrame API would consider moving in that direction, because plotting is really important during analysis process, and converting data frame using `toPandas()` method would fail for data that do not fit in memory. Although I'm not much familiar with internals, I would like to help for anything if team considers adding such a feature. On Wed, Mar 23, 2016 at 2:16 PM Teng Qiu <teng...@gmail.com> wrote: > e... then this sounds like a feature requirement for matplotlib, you > need to make matplotlib's APIs support RDD or spark DataFrame object, > i checked the API of mplot3d > ( > http://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html#mpl_toolkits.mplot3d.Axes3D.scatter > ), > it only supports "array-like" input data. > > so yes, to use matplotlib, you need to take the elements out of RDD, > and send them to plot API as list object. > > 2016-03-23 12:20 GMT+01:00 Yavuz Nuzumlalı <manuya...@gmail.com>: > > Thanks for help, but the example that you referenced gets the values from > > RDD as list and plots that list. > > > > What I am specifically asking was that is there a convenient way to plot > a > > DataFrame object directly?(like pandas DataFrame objects) > > > > > > On Wed, Mar 23, 2016 at 11:47 AM Teng Qiu <teng...@gmail.com> wrote: > >> > >> not sure about 3d plot, but there is a nice example: > >> > >> > https://github.com/zalando/spark-appliance/blob/master/examples/notebooks/PySpark_sklearn_matplotlib.ipynb > >> > >> for plotting rdd or dataframe using matplotlib. > >> > >> Am Mittwoch, 23. März 2016 schrieb Yavuz Nuzumlalı : > >> > Hi all, > >> > I'm trying to plot the result of a simple PCA operation, but couldn't > >> > find a clear documentation about plotting data frames. > >> > Here is the output of my data frame: > >> > +----------------------------------------------------------------+ > >> > |pca_features | > >> > +----------------------------------------------------------------+ > >> > |[-255.4681508918886,2.9340031372956155,-0.5357914079267039] | > >> > |[-477.03566189308367,-6.170290817861212,-5.280827588464785] | > >> > |[-163.13388125540507,-4.571443623272966,-1.2349427928939671] | > >> > |[-53.721252166903255,0.6162589419996329,-0.39569546286098245] | > >> > [-27.97717473880869,0.30883567826481106,-0.11159555340377557] | > >> > |[-118.27508063853554,1.3484584740407748,-0.8088790388907207] | > >> > Values of `pca_features` column is DenseVector s created using > >> > VectorAssembler. > >> > How can I draw a simple 3d scatter plot from this data frame? > >> > Thanks >