cala>
spark.read.schema(StructType(Seq(StructField("_1",StringType,false),
StructField("_2",StringType,true.parque
("hdfs://---/MY_DIRECTORY/*_1=201812030900*").show()
+++
| _1| _2|
+++
|null|ba1ca2dc033440125...|
|null|ba1ca2dc033440125...|
Dear all,
I am using Spark 2 in order to cluster data with the K-means algorithm.
My input data is flat and K-means requires sparse vectors with ordered
keys. Here is an example of an input and the expected output:
[id, key, value]
[1, 10, 100]
[1, 30, 300]
[2, 40, 400]
[1, 20, 200]
[id, lis