here you can find more information about the code of my class "
RandomForestRegression..java" :
http://spark.apache.org/docs/latest/mllib-ensembles.html#regression

2016-08-11 10:18 GMT+02:00 Zakaria Hili <zakah...@gmail.com>:

> Hi,
>
> I recognize that spark can't save generated model on HDFS (I'm used random
> forest regression and linear regression for this test).
> it can save only the data directory as you can see in the picture bellow :
>
> [image: Images intégrées 1]
>
> but to load a model I will need some data from metadata directory.
>
> When i test this application using my windows file system, its work
> perfectely (this method generate two folders: metadata and data).
>
> the error :
>
> 16/08/11 10:03:10 INFO Methods.RandomForestRegression: Model Saved
> successfuly
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist: hdfs://10.15.0.144:8020/user/
> ubuntu/ModelPrediction/metadata
>         at org.apache.hadoop.mapred.FileInputFormat.
> singleThreadedListStatus(FileInputFormat.java:287)
>         at org.apache.hadoop.mapred.FileInputFormat.listStatus(
> FileInputFormat.java:229)
>         at org.apache.hadoop.mapred.FileInputFormat.getSplits(
> FileInputFormat.java:315)
>         at org.apache.spark.rdd.HadoopRDD.getPartitions(
> HadoopRDD.scala:207)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(
> MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
>         at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
>         at scala.Option.getOrElse(Option.scala:120)
>         at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
>         at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1277)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:147)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:108)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.take(RDD.scala:1272)
>         at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:147)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:108)
>         at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
>         at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
>         at org.apache.spark.mllib.util.Loader$.loadMetadata(
> modelSaveLoad.scala:129)
>         at org.apache.spark.mllib.tree.model.RandomForestModel$.load(
> treeEnsembleModels.scala:88)
>         at org.apache.spark.mllib.tree.model.RandomForestModel.load(
> treeEnsembleModels.scala)
>         at Analytics.Methods.RandomForestRegression.generateModel(
> RandomForestRegression.java:171)
>         at Analytics.Main.main(Main.java:100)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>         at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(
> SparkSubmit.scala:180)
>         at org.apache.spark.deploy.SparkSubmit$.submit(
> SparkSubmit.scala:205)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
> scala:120)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> 16/08/11 10:03:10 INFO ui.SparkUI: Stopped Spark web UI at
> http://10.0.2.8:4040
> 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Interrupting
> monitor thread
> 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Shutting down
> all executors
> 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Asking each
> executor to shut down
> 16/08/11 10:03:10 INFO cluster.YarnClientSchedulerBackend: Stopped
> 16/08/11 10:03:10 INFO spark.MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 16/08/11 10:03:10 INFO storage.BlockManagerMaster: BlockManagerMaster
> stopped
> 16/08/11 10:03:10 INFO scheduler.OutputCommitCoordinator$
> OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
> 16/08/11 10:03:10 INFO util.ShutdownHookManager: Shutdown hook called
>
>
> my code :
>
> .
> .
>
> .
> Double testMSE =
>
> predictionAndLabel.map(new Function<Tuple2<Double, Double>, Double>() {
> /**
> *
> */
> private static final long serialVersionUID = -2599901384333786032L;
>
> @Override
> public Double call(Tuple2<Double, Double> pl) {
> Double diff = pl._1() - pl._2();
> return diff * diff;
> }
> }).reduce(new Function2<Double, Double, Double>() {
> /**
> *
> */
> private static final long serialVersionUID = -2714650221453068489L;
>
> @Override
> public Double call(Double a, Double b) {
> return a + b;
> }
> }) / trainingData.count();
>
>
>
>
> // Save and load model
> model.save(sc.sc(), output);
> Logger.getLogger(RandomForestRegression.class).info("Model Saved
> successfuly");
> line 171: RandomForestModel.load(sc.sc(),
> output);
> Logger.getLogger(RandomForestRegression.class).info("Test Load Model:
> Success");
>
>
> ------
> regards
> zakaria
>
> ᐧ
>



-- 
Zakaria Hili

Engineering student at ENSEIRB-MATMECA, Bordeaux, France.
  Intern at Akka Technologies
6 rue Roger Camboulives – 31100 Toulouse
Les Gémeaux - Bureau G 123 - Tel 07 53 65 36 85  Site : www.akka.eu
- Emails :
-zakariah...@gmail.com
-zakaria.h...@enseirb-matmeca.fr
-zakaria.h...@akka.eu

Reply via email to