Bernhard Walter created ZEPPELIN-2160: -----------------------------------------
Summary: PySpark: Matplotlib Integration extremely slow Key: ZEPPELIN-2160 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2160 Project: Zeppelin Issue Type: Bug Components: front-end, GUI Affects Versions: 0.7.0 Reporter: Bernhard Walter *Issue:* I tested matplotlib integration in Pyspark. As a baseline, the following 3 examples took at 1 - 2 seconds in Jupyter on the same machine. {code} %pyspark import matplotlib.pyplot as plt plt.plot([1,2,3,4]) plt.ylabel('some numbers') z.show(plt) {code} ==> 1 sec {code} %pyspark import numpy as np import matplotlib.pyplot as plt # Fixing random state for reproducibility np.random.seed(19680801) mu, sigma = 100, 15 x = mu + sigma * np.random.randn(10000) # the histogram of the data n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75) plt.xlabel('Smarts') plt.ylabel('Probability') plt.title('Histogram of IQ') plt.text(60, .025, r'$\mu=100,\ \sigma=15$') plt.axis([40, 160, 0, 0.03]) plt.grid(True) plt.show() {code} ==> 11 sec {code} %pyspark from ggplot import * ggplot(diamonds, aes(x='price', fill='cut')) +\ geom_density(alpha=0.25) +\ facet_wrap("clarity") {code} ==> 138 sec *Environment:* Downloaded http://apache.mirror.digionline.de/zeppelin/zeppelin-0.7.0/zeppelin-0.7.0-bin-netinst.tgz and installed spark, python, sh, md and angular interpreter Started via bin/zeppelin.sh -- This message was sent by Atlassian JIRA (v6.3.15#6346)