Hi everyone. I have a requirement to run prediction for random forest model locally on a web-service without touching spark at all in some specific cases. I've achieved that with previous mllib API (java 8 syntax):
public List<Tuple2<Double, Double>> predictLocally(RandomForestModel model, List<LabeledPoint> data) { return data.stream() .map(point -> new Tuple2<>(model.predict(point.features()), point.label())) .collect(Collectors.toList()); } So I have instance of trained model and can use it any way I want. The question is whether it's possible to run this on the driver itself with the following: DataFrame predictions = model.transform(test); because AFAIU test has to be a DataFrame, which means it's going to be run on the cluster. The use case to run it on driver is very small amount of data for prediction - much faster to handle it this way, than using spark cluster. Thank you. -- Be well! Jean Morozov