Re: spark and plot data

Andrew Ehrlich Sat, 23 Jul 2016 12:02:20 -0700

@Gourav, did you find any good inline plotting tools when using the Scala 
kernel? I found one based on highcharts but it was not frictionless the way 
matplotlib is.


> On Jul 23, 2016, at 2:26 AM, Gourav Sengupta <gourav.sengu...@gmail.com> 
> wrote:
> 
> Hi Pedro,
> 
> Toree is Scala kernel for Jupyter in case anyone needs a short intro. I use 
> it regularly (when I am not using IntelliJ) and its quite good.
> 
> Regards,
> Gourav
> 
> On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <ski.rodrig...@gmail.com 
> <mailto:ski.rodrig...@gmail.com>> wrote:
> As of the most recent 0.6.0 release its partially alleviated, but still not 
> great (compared to something like Jupyter).
> 
> They can be "downloaded" but its only really meaningful in importing it back 
> to Zeppelin. It would be great if they could be exported as HTML or PDF, but 
> at present they can't be. I know they have some sort of git support, but it 
> was never clear to me how it was suppose to be used since the docs are sparse 
> on that. So far what works best for us is S3 storage, but you don't get the 
> benefit of Github using that (history + commits etc).
> 
> There are a couple other notebooks floating around, Apache Toree seems the 
> most promising for portability since its based on jupyter 
> https://github.com/apache/incubator-toree 
> <https://github.com/apache/incubator-toree>
> 
> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <gourav.sengu...@gmail.com 
> <mailto:gourav.sengu...@gmail.com>> wrote:
> The biggest stumbling block to using Zeppelin has been that we cannot 
> download the notebooks, cannot export them and certainly cannot sync them 
> back to Github, without mind numbing and sometimes irritating hacks. Have 
> those issues been resolved?
> 
> 
> Regards,
> Gourav  
> 
> 
> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <ski.rodrig...@gmail.com 
> <mailto:ski.rodrig...@gmail.com>> wrote:
> Zeppelin works great. The other thing that we have done in notebooks (like 
> Zeppelin or Databricks) which support multiple types of spark session is 
> register Spark SQL temp tables in our scala code then escape hatch to python 
> for plotting with seaborn/matplotlib when the built in plots are insufficient.
> 
> —
> Pedro Rodriguez
> PhD Student in Large-Scale Machine Learning | CU Boulder
> Systems Oriented Data Scientist
> UC Berkeley AMPLab Alumni
> 
> pedrorodriguez.io <http://pedrorodriguez.io/> | 909-353-4423 
> <tel:909-353-4423>
> github.com/EntilZha <http://github.com/EntilZha> | LinkedIn 
> <https://www.linkedin.com/in/pedrorodriguezscience>
> 
> On July 22, 2016 at 3:04:48 AM, Marco Colombo (ing.marco.colo...@gmail.com 
> <mailto:ing.marco.colo...@gmail.com>) wrote:
> 
>> Take a look at zeppelin
>> 
>> http://zeppelin.apache.org <http://zeppelin.apache.org/>
>> 
>> Il giovedì 21 luglio 2016, Andy Davidson <a...@santacruzintegration.com 
>> <mailto:a...@santacruzintegration.com>> ha scritto:
>> Hi Pseudo
>> 
>> Plotting, graphing, data visualization, report generation are common needs 
>> in scientific and enterprise computing.
>> 
>> Can you tell me more about your use case? What is it about the current 
>> process / workflow do you think could be improved by pushing plotting (I 
>> assume you mean plotting and graphing) into spark.
>> 
>> 
>> In my personal work all the graphing is done in the driver on summary stats 
>> calculated using spark. So for me using standard python libs has not been a 
>> problem.
>> 
>> Andy
>> 
>> From: pseudo oduesp <pseudo20...@gmail.com <>>
>> Date: Thursday, July 21, 2016 at 8:30 AM
>> To: "user @spark" <user@spark.apache.org <>>
>> Subject: spark and plot data
>> 
>> Hi , 
>> i know spark  it s engine  to compute large data set but for me i work with 
>> pyspark and it s very wonderful machine 
>> 
>> my question  we  don't have tools for ploting data each time we have to 
>> switch and go back to python for using plot.
>> but when you have large result scatter plot or roc curve  you cant use 
>> collect to take data .
>> 
>> somone have propostion for plot .
>> 
>> thanks 
>> 
>> 
>> --
>> Ing. Marco Colombo
> 
> 
> 
> 
> -- 
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
> 
> ski.rodrig...@gmail.com <mailto:ski.rodrig...@gmail.com> | pedrorodriguez.io 
> <http://pedrorodriguez.io/> | 909-353-4423
> Github: github.com/EntilZha <http://github.com/EntilZha> | LinkedIn: 
> https://www.linkedin.com/in/pedrorodriguezscience 
> <https://www.linkedin.com/in/pedrorodriguezscience>
> 
>

Re: spark and plot data

Reply via email to