I think for sure  SPARK-28547
<https://issues.apache.org/jira/projects/SPARK/issues/SPARK-28547>  
At the moment there are some flows in Spark architecture and it performs
miserably or even freezes everywhere where column number exceeds 10-15K
(even simple describe function takes ages while the same functions with
pandas and no Spark take seconds). In many fields (like bioinformatics) wide
datasets with both large numbers of rows and columns are very common (gene
expression data is a good example here) and Spark is totally useless there.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to