here you can find more information about the code of my class " RandomForestRegression..java" : http://spark.apache.org/docs/latest/mllib-ensembles.html#regression ᐧ
2016-08-11 10:18 GMT+02:00 Zakaria Hili <zakah...@gmail.com>: > Hi, > > I recognize that spark can't save generated model on HDFS (I'm used random > forest regression and linear regression for this test). > it can save only the data directory as you can see in the picture bellow : > > [image: Images intégrées 1] > > but to load a model I will need some data from metadata directory. > > When i test this application using my windows file system, its work > perfectely (this method generate two folders: metadata and data). > > the error : > > 16/08/11 10:03:10 INFO Methods.RandomForestRegression: Model Saved > successfuly > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: hdfs://10.15.0.144:8020/user/ > ubuntu/ModelPrediction/metadata > at org.apache.hadoop.mapred.FileInputFormat. > singleThreadedListStatus(FileInputFormat.java:287) > at org.apache.hadoop.mapred.FileInputFormat.listStatus( > FileInputFormat.java:229) > at org.apache.hadoop.mapred.FileInputFormat.getSplits( > FileInputFormat.java:315) > at org.apache.spark.rdd.HadoopRDD.getPartitions( > HadoopRDD.scala:207) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( > MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:147) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) > at org.apache.spark.rdd.RDD.take(RDD.scala:1272) > at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:147) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:108) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) > at org.apache.spark.rdd.RDD.first(RDD.scala:1311) > at org.apache.spark.mllib.util.Loader$.loadMetadata( > modelSaveLoad.scala:129) > at org.apache.spark.mllib.tree.model.RandomForestModel$.load( > treeEnsembleModels.scala:88) > at org.apache.spark.mllib.tree.model.RandomForestModel.load( > treeEnsembleModels.scala) > at Analytics.Methods.RandomForestRegression.generateModel( > RandomForestRegression.java:171) > at Analytics.Main.main(Main.java:100) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1( > SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit( > SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. > scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > 16/08/11 10:03:10 INFO ui.SparkUI: Stopped Spark web UI at > http://10.0.2.8:4040 > 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Interrupting > monitor thread > 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Shutting down > all executors > 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Asking each > executor to shut down > 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Stopped > 16/08/11 10:03:10 INFO spark.MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 16/08/11 10:03:10 INFO storage.BlockManagerMaster: BlockManagerMaster > stopped > 16/08/11 10:03:10 INFO scheduler.OutputCommitCoordinator$ > OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! > 16/08/11 10:03:10 INFO util.ShutdownHookManager: Shutdown hook called > > > my code : > > . > . > > . > Double testMSE = > > predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() { > /** > * > */ > private static final long serialVersionUID = -2599901384333786032L; > > @Override > public Double call(Tuple2<Double, Double> pl) { > Double diff = pl._1() - pl._2(); > return diff * diff; > } > }).reduce(new Function2<Double, Double, Double>() { > /** > * > */ > private static final long serialVersionUID = -2714650221453068489L; > > @Override > public Double call(Double a, Double b) { > return a + b; > } > }) / trainingData.count(); > > > > > // Save and load model > model.save(sc.sc(), output); > Logger.getLogger(RandomForestRegression.class).info("Model Saved > successfuly"); > line 171: RandomForestModel.load(sc.sc(), > output); > Logger.getLogger(RandomForestRegression.class).info("Test Load Model: > Success"); > > > ------ > regards > zakaria > > ᐧ > -- Zakaria Hili Engineering student at ENSEIRB-MATMECA, Bordeaux, France. Intern at Akka Technologies 6 rue Roger Camboulives – 31100 Toulouse Les Gémeaux - Bureau G 123 - Tel 07 53 65 36 85 Site : www.akka.eu - Emails : -zakariah...@gmail.com -zakaria.h...@enseirb-matmeca.fr -zakaria.h...@akka.eu