Hi,
I create a dataframe using a schema, but when I try to create a model, I
receive this error:
requirement failed: Column features must be of type
org.apache.spark.mllib.linalg.VectorUDT@f71b0bce but was actually
ArrayType(StringType,true)
#### piece of code ####
SQLContext sqlContext = SQLContext.getOrCreate(rdd.context());
StructType schema = DataTypes
.createStructType(new StructField[] {
DataTypes.createStructField("id", DataTypes.StringType,
false),
DataTypes.createStructField("date", DataTypes.StringType,
false),
DataTypes.createStructField("temperature",
DataTypes.StringType, true),);
// I receive data from another application like this id,date,temperature
JavaRDD<Row> rowsRdd = rdd.map(e ->
RowFactory.create(e.split(",")));
DataFrame df = sqlContext.createDataFrame(rowsRdd, schema);
LinearRegression lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8);
Tokenizer tokenizer = new
Tokenizer().setInputCol("temperature").setOutputCol("features");
// the problem is here :
DataFrame result = tokenizer.transform(df);
LinearRegressionModel lrModel = lr.fit(result);
######################
And I don't know how can I do this and why I need label field if I tried to
transform column features into vector ?
thank you in advance
regards,
Zakaria
ᐧ