Nawid Sayed created ZEPPELIN-4358: ------------------------------------- Summary: Seaborn renders plots slowly in apache zeppelin notebooks Key: ZEPPELIN-4358 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4358 Project: Zeppelin Issue Type: Bug Components: pySpark Affects Versions: 0.8.1 Reporter: Nawid Sayed
I am currently trying to generate visualizations in zeppelin (0.8.1) notebooks using the pyspark interpreter with python 3.7.3. Generating the following simple plot with seaborn (0.9.0) takes around 5 minutes (with very high CPU usage throughout the duration): ```%pyspark %pyspark import seaborn as sns import numpy as np import pandas as pd data = pd.DataFrame(np.random.rand(100,3)) sns.pairplot(data) ``` This behavior is rather inconsistent as the following (much more data intensive) plot is rendered instantly ```%pyspark %pyspark import seaborn as sns import numpy as np import pandas as pd df = pd.DataFrame(data = np.random.rand(10000,2)) sns.lineplot(x = 0, y = 1, data = df) ``` I noticed that using matplotlib (3.1.0) is generally much faster for and almost as snappy as I am used to from jupyter notebook environments. I have already read about issue [ZEPPELIN-1894](https://jira.apache.org/jira/browse/ZEPPELIN-1894) but I can render the mentioned scatterplot instantly as well. I already stated my question on StackOverflow but I think here is a better place: -- This message was sent by Atlassian Jira (v8.3.4#803005)