Re: spark and plot data

andy petrella Sat, 23 Jul 2016 03:17:10 -0700

Heya,

Might be worth checking the spark-notebook <http://spark-notebook.io/> I
guess, it offers custom and reactive dynamic charts (scatter, line, bar,
pie, graph, radar, parallel, pivot, …) for any kind of data from an
intuitive and easy Scala API (with server side, incl. spark based, sampling
if needed).


There are many charts available natively, you can check this repo
<https://github.com/data-fellas/scala-for-data-science> (specially the
notebook named Why Spark Notebook) and if you’re familiar with docker, you
can even simply do the following (and use spark 2.0)

docker datafellas/scala-for-data-science:1.0-spark2
docker run --rm -it --net=host -m 8g
datafellas/scala-for-data-science:1.0-spark2 bash

<https://github.com/data-fellas/scala-for-data-science#start-the-services>

For any question, you can poke the community live on our gitter
<https://gitter.im/andypetrella/spark-notebook?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge>
or from github <https://github.com/andypetrella/spark-notebook> of course
HTH
andy

On Sat, Jul 23, 2016 at 11:26 AM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

Hi Pedro,
>
> Toree is Scala kernel for Jupyter in case anyone needs a short intro. I
> use it regularly (when I am not using IntelliJ) and its quite good.
>
> Regards,
> Gourav
>
> On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <ski.rodrig...@gmail.com
> > wrote:
>
>> As of the most recent 0.6.0 release its partially alleviated, but still
>> not great (compared to something like Jupyter).
>>
>> They can be "downloaded" but its only really meaningful in importing it
>> back to Zeppelin. It would be great if they could be exported as HTML or
>> PDF, but at present they can't be. I know they have some sort of git
>> support, but it was never clear to me how it was suppose to be used since
>> the docs are sparse on that. So far what works best for us is S3 storage,
>> but you don't get the benefit of Github using that (history + commits etc).
>>
>> There are a couple other notebooks floating around, Apache Toree seems
>> the most promising for portability since its based on jupyter
>> https://github.com/apache/incubator-toree
>>
>> On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta <
>> gourav.sengu...@gmail.com> wrote:
>>
>>> The biggest stumbling block to using Zeppelin has been that we cannot
>>> download the notebooks, cannot export them and certainly cannot sync them
>>> back to Github, without mind numbing and sometimes irritating hacks. Have
>>> those issues been resolved?
>>>
>>>
>>> Regards,
>>> Gourav
>>>
>>>
>>> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <
>>> ski.rodrig...@gmail.com> wrote:
>>>
>>>> Zeppelin works great. The other thing that we have done in notebooks
>>>> (like Zeppelin or Databricks) which support multiple types of spark session
>>>> is register Spark SQL temp tables in our scala code then escape hatch to
>>>> python for plotting with seaborn/matplotlib when the built in plots are
>>>> insufficient.
>>>>
>>>> —
>>>> Pedro Rodriguez
>>>> PhD Student in Large-Scale Machine Learning | CU Boulder
>>>> Systems Oriented Data Scientist
>>>> UC Berkeley AMPLab Alumni
>>>>
>>>> pedrorodriguez.io | 909-353-4423
>>>> github.com/EntilZha | LinkedIn
>>>> <https://www.linkedin.com/in/pedrorodriguezscience>
>>>>
>>>> On July 22, 2016 at 3:04:48 AM, Marco Colombo (
>>>> ing.marco.colo...@gmail.com) wrote:
>>>>
>>>> Take a look at zeppelin
>>>>
>>>> http://zeppelin.apache.org
>>>>
>>>> Il giovedì 21 luglio 2016, Andy Davidson <a...@santacruzintegration.com>
>>>> ha scritto:
>>>>
>>>>> Hi Pseudo
>>>>>
>>>>> Plotting, graphing, data visualization, report generation are common
>>>>> needs in scientific and enterprise computing.
>>>>>
>>>>> Can you tell me more about your use case? What is it about the current
>>>>> process / workflow do you think could be improved by pushing plotting (I
>>>>> assume you mean plotting and graphing) into spark.
>>>>>
>>>>>
>>>>> In my personal work all the graphing is done in the driver on summary
>>>>> stats calculated using spark. So for me using standard python libs has not
>>>>> been a problem.
>>>>>
>>>>> Andy
>>>>>
>>>>> From: pseudo oduesp <pseudo20...@gmail.com>
>>>>> Date: Thursday, July 21, 2016 at 8:30 AM
>>>>> To: "user @spark" <user@spark.apache.org>
>>>>> Subject: spark and plot data
>>>>>
>>>>> Hi ,
>>>>> i know spark  it s engine  to compute large data set but for me i work
>>>>> with pyspark and it s very wonderful machine
>>>>>
>>>>> my question  we  don't have tools for ploting data each time we have
>>>>> to switch and go back to python for using plot.
>>>>> but when you have large result scatter plot or roc curve  you cant use
>>>>> collect to take data .
>>>>>
>>>>> somone have propostion for plot .
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>
>>>> --
>>>> Ing. Marco Colombo
>>>>
>>>>
>>>
>>
>>
>> --
>> Pedro Rodriguez
>> PhD Student in Distributed Machine Learning | CU Boulder
>> UC Berkeley AMPLab Alumni
>>
>> ski.rodrig...@gmail.com | pedrorodriguez.io | 909-353-4423
>> Github: github.com/EntilZha | LinkedIn:
>> https://www.linkedin.com/in/pedrorodriguezscience
>>
>>
> 
-- 
andy

Re: spark and plot data

Reply via email to