Hi Pedro, Toree is Scala kernel for Jupyter in case anyone needs a short intro. I use it regularly (when I am not using IntelliJ) and its quite good.
Regards, Gourav On Fri, Jul 22, 2016 at 11:15 PM, Pedro Rodriguez <[email protected]> wrote: > As of the most recent 0.6.0 release its partially alleviated, but still > not great (compared to something like Jupyter). > > They can be "downloaded" but its only really meaningful in importing it > back to Zeppelin. It would be great if they could be exported as HTML or > PDF, but at present they can't be. I know they have some sort of git > support, but it was never clear to me how it was suppose to be used since > the docs are sparse on that. So far what works best for us is S3 storage, > but you don't get the benefit of Github using that (history + commits etc). > > There are a couple other notebooks floating around, Apache Toree seems the > most promising for portability since its based on jupyter > https://github.com/apache/incubator-toree > > On Fri, Jul 22, 2016 at 3:53 PM, Gourav Sengupta < > [email protected]> wrote: > >> The biggest stumbling block to using Zeppelin has been that we cannot >> download the notebooks, cannot export them and certainly cannot sync them >> back to Github, without mind numbing and sometimes irritating hacks. Have >> those issues been resolved? >> >> >> Regards, >> Gourav >> >> >> On Fri, Jul 22, 2016 at 2:22 PM, Pedro Rodriguez <[email protected] >> > wrote: >> >>> Zeppelin works great. The other thing that we have done in notebooks >>> (like Zeppelin or Databricks) which support multiple types of spark session >>> is register Spark SQL temp tables in our scala code then escape hatch to >>> python for plotting with seaborn/matplotlib when the built in plots are >>> insufficient. >>> >>> — >>> Pedro Rodriguez >>> PhD Student in Large-Scale Machine Learning | CU Boulder >>> Systems Oriented Data Scientist >>> UC Berkeley AMPLab Alumni >>> >>> pedrorodriguez.io | 909-353-4423 >>> github.com/EntilZha | LinkedIn >>> <https://www.linkedin.com/in/pedrorodriguezscience> >>> >>> On July 22, 2016 at 3:04:48 AM, Marco Colombo ( >>> [email protected]) wrote: >>> >>> Take a look at zeppelin >>> >>> http://zeppelin.apache.org >>> >>> Il giovedì 21 luglio 2016, Andy Davidson <[email protected]> >>> ha scritto: >>> >>>> Hi Pseudo >>>> >>>> Plotting, graphing, data visualization, report generation are common >>>> needs in scientific and enterprise computing. >>>> >>>> Can you tell me more about your use case? What is it about the current >>>> process / workflow do you think could be improved by pushing plotting (I >>>> assume you mean plotting and graphing) into spark. >>>> >>>> >>>> In my personal work all the graphing is done in the driver on summary >>>> stats calculated using spark. So for me using standard python libs has not >>>> been a problem. >>>> >>>> Andy >>>> >>>> From: pseudo oduesp <[email protected]> >>>> Date: Thursday, July 21, 2016 at 8:30 AM >>>> To: "user @spark" <[email protected]> >>>> Subject: spark and plot data >>>> >>>> Hi , >>>> i know spark it s engine to compute large data set but for me i work >>>> with pyspark and it s very wonderful machine >>>> >>>> my question we don't have tools for ploting data each time we have to >>>> switch and go back to python for using plot. >>>> but when you have large result scatter plot or roc curve you cant use >>>> collect to take data . >>>> >>>> somone have propostion for plot . >>>> >>>> thanks >>>> >>>> >>> >>> -- >>> Ing. Marco Colombo >>> >>> >> > > > -- > Pedro Rodriguez > PhD Student in Distributed Machine Learning | CU Boulder > UC Berkeley AMPLab Alumni > > [email protected] | pedrorodriguez.io | 909-353-4423 > Github: github.com/EntilZha | LinkedIn: > https://www.linkedin.com/in/pedrorodriguezscience > >
