Here's a fragment of code that intends to convert a Dataset<Row> of features into a Vector of Doubles for use as the features column for SparkML's DecisionTree algorithm. My current problem is the .map() operation, which refuses to compile with an eclipse error "The method map(Function1<Row,U>, Encoder<U>) in the type Dataset<Row> is not applicable for the arguments (new Function<Row,Vector>(){}, Encoder<Vector>)" that I'm unable to resolve. I'd also appreciate examples of how to use StringIndexer instead of my hand-coded FeatureMapper, or any other suggestions of how to make ML less painful to do in Java.
Dataset<Vector> featureDS = incomingDS .select( "Passenger Class", "Sex", "No of Siblings or Spouses on Board", "No of Parents or Children on Board", "Passenger Fare") .filter(new FilterFunction<Row>() { public boolean call(Row row) throws Exception { if (row.getString(0).equals(features[0])) // header return false; else return true; } }) .map(new Function<Row, Vector>() { public Vector call(Row row) throws Exception { double[] v = new double[features.length]; for (int i = 0; i < features.length; i++) { String s = row.getString(i); Double d =featureMapperList .get(i) .mapStringToDouble(s); v[i] = d; } Vector featureVec = Vectors.dense(v); return featureVec; } }, Encoders.bean(Vector.class)); Dr. Brad J. Cox Cell: 703-594-1883 Skype: dr.brad.cox --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org