Here's a fragment of code that intends to convert a Dataset<Row> of features 
into a Vector of Doubles for use as the features column for SparkML's 
DecisionTree algorithm. My current problem is the .map() operation, which 
refuses to compile with an eclipse error "The method map(Function1<Row,U>, 
Encoder<U>) in the type Dataset<Row> is not applicable for the arguments (new 
Function<Row,Vector>(){}, Encoder<Vector>)" that I'm unable to resolve. I'd 
also appreciate examples of how to use StringIndexer instead of my hand-coded 
FeatureMapper, or any other suggestions of how to make ML less painful to do in 
Java. 

                Dataset<Vector> featureDS = incomingDS
                                .select(
                                                "Passenger Class",
                                                "Sex",
                                                "No of Siblings or Spouses on 
Board",
                                                "No of Parents or Children on 
Board",
                                                "Passenger Fare")
                                .filter(new FilterFunction<Row>()
                                {
                                        public boolean call(Row row) throws 
Exception
                                        {
                                                if 
(row.getString(0).equals(features[0])) // header
                                                        return false;
                                                else return true;
                                        }
                                })
                                .map(new Function<Row, Vector>()
                                {
                                        public Vector call(Row row) throws 
Exception
                                        {
                                                double[] v = new 
double[features.length];
                                                for (int i = 0; i < 
features.length; i++)
                                                {
                                                        String s = 
row.getString(i);
                                                        Double d 
=featureMapperList
                                                                .get(i)
                                                                
.mapStringToDouble(s);
                                                        v[i] = d;
                                                }
                                                Vector featureVec = 
Vectors.dense(v);
                                                return featureVec;
                                        }
                                }, Encoders.bean(Vector.class));


Dr. Brad J. Cox    Cell: 703-594-1883 Skype: dr.brad.cox





---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to