date:20160901

Is Spark's KMeans unable to handle bigdata?

2016-09-01 Thread Georgios Samaras

Dear all, the random initialization works well, but the default initialization is k-means|| and has made me struggle. Also, I had heard people one year ago struggling with it too, and everybody would just skip it and use random, but I cannot keep it inside me! I have posted a minimal example

Dynamic Partitions When Writing Parquet

2016-09-01 Thread Liz Bai

Hi there, I have a question about writing Parquet using SparkSQL. Spark 1.4 has already supported writing DataFrames as Parquet files with “partitionBy(colNames: String*)”, as Spark-6561 fixed. Is there any method or plan to write Parquet with dynamic partitions? For example, instead of partiti