Hi, I recognize that spark can't save generated model on HDFS (I'm used random forest regression and linear regression for this test). it can save only the data directory as you can see in the picture bellow :
[image: Images intégrées 1] but to load a model I will need some data from metadata directory. When i test this application using my windows file system, its work perfectely (this method generate two folders: metadata and data). the error : 16/08/11 10:03:10 INFO Methods.RandomForestRegression: Model Saved successfuly Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:// 10.15.0.144:8020/user/ubuntu/ModelPrediction/metadata at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.take(RDD.scala:1272) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.first(RDD.scala:1311) at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:129) at org.apache.spark.mllib.tree.model.RandomForestModel$.load(treeEnsembleModels.scala:88) at org.apache.spark.mllib.tree.model.RandomForestModel.load(treeEnsembleModels.scala) at Analytics.Methods.RandomForestRegression.generateModel(RandomForestRegression.java:171) at Analytics.Main.main(Main.java:100) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/08/11 10:03:10 INFO ui.SparkUI: Stopped Spark web UI at http://10.0.2.8:4040 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Stopped 16/08/11 10:03:10 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/08/11 10:03:10 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 16/08/11 10:03:10 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/08/11 10:03:10 INFO util.ShutdownHookManager: Shutdown hook called my code : . . . Double testMSE = predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() { /** * */ private static final long serialVersionUID = -2599901384333786032L; @Override public Double call(Tuple2<Double, Double> pl) { Double diff = pl._1() - pl._2(); return diff * diff; } }).reduce(new Function2<Double, Double, Double>() { /** * */ private static final long serialVersionUID = -2714650221453068489L; @Override public Double call(Double a, Double b) { return a + b; } }) / trainingData.count(); // Save and load model model.save(sc.sc(), output); Logger.getLogger(RandomForestRegression.class).info("Model Saved successfuly"); line 171: RandomForestModel.load(sc.sc(), output); Logger.getLogger(RandomForestRegression.class).info("Test Load Model: Success"); ------ regards zakaria ᐧ