Re: spark and plot data

Gourav Sengupta Sat, 23 Jul 2016 02:27:07 -0700

Hi Pedro,

Toree is Scala kernel for Jupyter in case anyone needs a short intro. I use
it regularly (when I am not using IntelliJ) and its quite good.


Regards,
Gourav

On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <[email protected]>
wrote:

> As of the most recent 0.6.0 release its partially alleviated, but still
> not great (compared to something like Jupyter).
>
> They can be "downloaded" but its only really meaningful in importing it
> back to Zeppelin. It would be great if they could be exported as HTML or
> PDF, but at present they can't be. I know they have some sort of git
> support, but it was never clear to me how it was suppose to be used since
> the docs are sparse on that. So far what works best for us is S3 storage,
> but you don't get the benefit of Github using that (history + commits etc).
>
> There are a couple other notebooks floating around, Apache Toree seems the
> most promising for portability since its based on jupyter
> https://github.com/apache/incubator-toree
>
> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <
> [email protected]> wrote:
>
>> The biggest stumbling block to using Zeppelin has been that we cannot
>> download the notebooks, cannot export them and certainly cannot sync them
>> back to Github, without mind numbing and sometimes irritating hacks. Have
>> those issues been resolved?
>>
>>
>> Regards,
>> Gourav
>>
>>
>> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <[email protected]
>> > wrote:
>>
>>> Zeppelin works great. The other thing that we have done in notebooks
>>> (like Zeppelin or Databricks) which support multiple types of spark session
>>> is register Spark SQL temp tables in our scala code then escape hatch to
>>> python for plotting with seaborn/matplotlib when the built in plots are
>>> insufficient.
>>>
>>> —
>>> Pedro Rodriguez
>>> PhD Student in Large-Scale Machine Learning | CU Boulder
>>> Systems Oriented Data Scientist
>>> UC Berkeley AMPLab Alumni
>>>
>>> pedrorodriguez.io | 909-353-4423
>>> github.com/EntilZha | LinkedIn
>>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>>
>>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>>> [email protected]) wrote:
>>>
>>> Take a look at zeppelin
>>>
>>> http://zeppelin.apache.org
>>>
>>> Il giovedì 21 luglio 2016, Andy Davidson <[email protected]>
>>> ha scritto:
>>>
>>>> Hi Pseudo
>>>>
>>>> Plotting, graphing, data visualization, report generation are common
>>>> needs in scientific and enterprise computing.
>>>>
>>>> Can you tell me more about your use case? What is it about the current
>>>> process / workflow do you think could be improved by pushing plotting (I
>>>> assume you mean plotting and graphing) into spark.
>>>>
>>>>
>>>> In my personal work all the graphing is done in the driver on summary
>>>> stats calculated using spark. So for me using standard python libs has not
>>>> been a problem.
>>>>
>>>> Andy
>>>>
>>>> From: pseudo oduesp <[email protected]>
>>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>>> To: "user @spark" <[email protected]>
>>>> Subject: spark and plot data
>>>>
>>>> Hi ,
>>>> i know spark  it s engine  to compute large data set but for me i work
>>>> with pyspark and it s very wonderful machine
>>>>
>>>> my question  we  don't have tools for ploting data each time we have to
>>>> switch and go back to python for using plot.
>>>> but when you have large result scatter plot or roc curve  you cant use
>>>> collect to take data .
>>>>
>>>> somone have propostion for plot .
>>>>
>>>> thanks
>>>>
>>>>
>>>
>>> --
>>> Ing. Marco Colombo
>>>
>>>
>>
>
>
> --
> Pedro Rodriguez
> PhD Student in Distributed Machine Learning | CU Boulder
> UC Berkeley AMPLab Alumni
>
> [email protected] | pedrorodriguez.io | 909-353-4423
> Github: github.com/EntilZha | LinkedIn:
> https://www.linkedin.com/in/pedrorodriguezscience
>
>

Re: spark and plot data

Reply via email to