Re: Spark 3.0 preview release feature list and major changes

antonkulaga Thu, 10 Oct 2019 12:31:02 -0700

I think for sure  SPARK-28547
<https://issues.apache.org/jira/projects/SPARK/issues/SPARK-28547>  
At the moment there are some flows in Spark architecture and it performs
miserably or even freezes everywhere where column number exceeds 10-15K
(even simple describe function takes ages while the same functions with
pandas and no Spark take seconds). In many fields (like bioinformatics) wide
datasets with both large numbers of rows and columns are very common (gene
expression data is a good example here) and Spark is totally useless there.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Spark 3.0 preview release feature list and major changes

Reply via email to