Hi, after I created a dataset
Dataset<Row> df = sqlContext.sql("query"); I need to have a result values and I call a method: collectAsList() List<Row> list = df.collectAsList(); But it's very slow, if I work with large datasets (20-30 million records). I know, that the result isn't presented in driver app, that's why it takes long time, because collectAsList() collect all data from worker nodes. But then what is the right way to get result values? Is there an other solution to iterate over a result dataset rows, or get values? Can anyone post a small & working example? Thanks & Regards, Laszlo Szep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Alternatives-for-dataframe-collectAsList-tp28547.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org