Alternatives for dataframe collectAsList()

szep.laszlo.it Wed, 29 Mar 2017 12:00:21 -0700

Hi,

after I created a dataset


Dataset<Row> df = sqlContext.sql("query");

I need to have a result values and I call a method: collectAsList()

List<Row> list = df.collectAsList();

But it's very slow, if I work with large datasets (20-30 million records). I
know, that the result isn't presented in driver app, that's why it takes
long time, because collectAsList() collect all data from worker nodes.

But then what is the right way to get result values? Is there an other
solution to iterate over a result dataset rows, or get values? Can anyone
post a small & working example?

Thanks & Regards,
Laszlo Szep



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Alternatives-for-dataframe-collectAsList-tp28547.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Alternatives for dataframe collectAsList()

Reply via email to